CN113359717A - Mobile robot navigation obstacle avoidance method based on deep reinforcement learning - Google Patents

Mobile robot navigation obstacle avoidance method based on deep reinforcement learning Download PDF

Info

Publication number
CN113359717A
CN113359717A CN202110575846.8A CN202110575846A CN113359717A CN 113359717 A CN113359717 A CN 113359717A CN 202110575846 A CN202110575846 A CN 202110575846A CN 113359717 A CN113359717 A CN 113359717A
Authority
CN
China
Prior art keywords
robot
obstacle avoidance
network
reinforcement learning
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110575846.8A
Other languages
Chinese (zh)
Other versions
CN113359717B (en
Inventor
刘安东
崔奇
夏浩
周时钎
滕游
张文安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110575846.8A priority Critical patent/CN113359717B/en
Publication of CN113359717A publication Critical patent/CN113359717A/en
Application granted granted Critical
Publication of CN113359717B publication Critical patent/CN113359717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process

Abstract

A mobile robot navigation obstacle avoidance method based on deep reinforcement learning is characterized in that a neural network is utilized to extract characteristics of paired interaction between a robot and each person, and interaction between the persons is captured through a local map; a self-attention mechanism is used for aggregating interaction features to deduce the relative importance of adjacent human beings relative to the future state of the adjacent human beings, a reinforcement learning method is used for pre-training a value network, and a safety action space is added in the training process to prevent the occurrence of emergency; establishing a two-dimensional grid map, planning a global path by setting global target points by using an RRT (remote distance transform) algorithm, finding the nearest point in a circle with the current position of the robot as the center of the circle and r as the radius, setting the nearest point as a dynamic local target, and selecting the optimal action through an optimal strategy to finally realize the navigation obstacle avoidance function of the mobile robot. The invention solves the problems of short sight, slow response and the like of the robot in the existing navigation obstacle avoidance process.

Description

Mobile robot navigation obstacle avoidance method based on deep reinforcement learning
Technical Field
The invention relates to a navigation obstacle avoidance method of a mobile robot, in particular to a navigation obstacle avoidance method of a mobile robot based on the combination of deep learning and reinforcement learning, and belongs to the field of mobile robots.
Background
The robot is equipment integrating multiple fields of technology intersection, such as machinery, electronics, computers, control, artificial intelligence and the like, particularly a mobile robot, and is widely applied to various fields of human production and life due to the characteristics of autonomous control and flexible motion. The mobile robot refers to an intelligent robot which can detect the external environment through a sensor and realize continuous and autonomous driving in a complex environment, relates to the subject fields of information perception, motion planning, autonomous control and the like, and is the latest achievement of artificial intelligence technology and computer information science.
The most important capability of the mobile robot is navigation, namely the mobile robot can avoid obstacles in a working range space and realize safe movement from an initial position to a target position. In recent years, the application of the mobile robot has been expanded to outdoor unknown environments such as deep sea, polar regions and the like, and higher requirements are put forward on the navigation function of the mobile robot. The autonomous navigation and motion control technology is the key for solving the problem of track planning motion of the mobile robot under the condition of unknown non-structural and environmental conditions, so that the method has important theoretical value and application value for the research of the navigation control method of the mobile robot.
Deep learning and reinforcement learning are two important branches in the field of machine learning, and are always taken as a research hotspot by scholars at home and abroad, particularly in the field of mobile robots.
The learning process of reinforcement learning is a dynamic, constantly interactive process, and the required data is also generated by constantly interacting with the environment. The reinforcement learning involves many objects, such as actions, environments, state transition probabilities, and reward functions. In addition, deep learning such as image recognition and speech recognition solves the perception problem, and reinforcement learning solves the decision problem. Therefore, the deep reinforcement learning algorithm generated by combining the developed deep learning technology and the reinforcement learning algorithm is taken as the development trend of the artificial intelligence in the future.
In reality, the mobile robot is generally an environment in which people, machines and objects coexist, and in the environment, a plurality of moving bodies, people and other equipment need to perform trajectory planning movement in a narrow and crowded environment, so that the mobile robot is required to be capable of avoiding obstacles in a crowded scene. As humans, we have the inherent ability to adjust their behavior by observing others, so we can easily pass through people or other objects. However, performing collision-free navigation in dynamic and crowded scenarios is still a difficult task for mobile robots. In conventional mobile robot navigation methods, a moving intelligent body is generally regarded as a static obstacle or a next action is performed according to a specific interaction rule, and the conventional methods prevent collision through passive reaction or manually define a function to ensure safety, so that the problems of short sight, slow reaction, insecurity and the like of a robot are caused.
Disclosure of Invention
In order to overcome the defects of the prior art and based on deep learning and reinforcement learning research, the invention provides a mobile robot navigation obstacle avoidance method based on deep reinforcement learning, which can predict the dynamics of human beings, solve the problems of short sight, slow response and the like of a robot in the navigation obstacle avoidance process, and add a safety mechanism to prevent sudden situations from happening in the navigation process. And finally, a dynamic local target mechanism is designed, so that the time of the navigation obstacle avoidance process can be reduced.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a mobile robot navigation obstacle avoidance method based on deep reinforcement learning comprises the following steps
1) Training a value network model by adopting a time difference method in deep learning middle and deep layer cyclic neural network and reinforcement learning so as to realize robot navigation obstacle avoidance;
2) simplifying the robot and each person into a circle, and defining the state of the robot at the time t:
St=[px,py,v,θ,gx,gy,r,vpref] (1)
wherein p isx,pyShowing machineThe current position of the robot, v represents the current speed of the robot, theta represents the azimuth angle of the robot, and gx,gyRepresenting the target position of the robot, r the radius of the robot, vprefRepresenting a preferred speed of the robot;
defining each person's state at time t:
Figure BDA0003084322280000021
defining a reward and penalty function:
Figure BDA0003084322280000022
wherein, at=vtIndicating the movement of the robot, dminRepresenting the minimum separation distance, d, between the robot and the person during the time Δ tcomfRepresents a comfortable distance that a person can endure, dgoalRepresenting the distance from the current position of the robot to the target point;
3) will StAnd OtInputting the data into a deep circulation neural network with initial weight, simulating a navigation strategy of a human expert by the robot to obtain demonstration experience D, storing the demonstration experience D into an initialized experience pool E, initializing a value network V by using a random weight value theta, initializing a target value network V' into a current value network V, and circulating each event to obtain an optimal value network V;
4) establishing a two-dimensional grid map, setting a global target point, and continuously updating the joint state of the robot and the human by using a pre-trained value network V:
Kt=[St,Ot] (4)
5) then, an RRT algorithm is utilized to plan a global optimal path and make an optimal strategy
Figure BDA0003084322280000031
Wherein A represents a set of motion spacesIn sum, γ ∈ (0,1) denotes the attenuation factor, Δ t denotes the time interval between two actions, vprefIndicates the preferred speed, V*(Kt+Δt) Represents the optimal value at time t + Δ t;
6) selecting optimal action a through optimal strategytI.e. the optimum speed vtAnd local obstacle avoidance is realized until the robot reaches the position of the target point.
Further, in the step 1), the value network model consists of an interaction module, a pooling module and a planning module, wherein the interaction module uses a multilayer perceptron to embed the state of each person and the state of the robot into a vector e with a fixed lengthi
ei=ψe(St,Ot;We)(i=1,2,3,…n) (6)
Wherein psieIs a multi-layered perceptron with activation functions for modeling human and robot interactions, WeIs the embedding weight;
then embedding vector eiInputting into a subsequent multilayer perceptron:
hi=φh(ei;Wh)(i=1,2,3,…n) (7)
wherein phi ishIs a full connection layer with a nonlinear activation function to obtain the interaction characteristics of the robot and the ith person, WhIs the network weight;
the pooling module first embeds the interaction into a vector eiConversion to attention score βi
Figure BDA0003084322280000032
βi=ρβ(ei,em;Wβ)(i=1,2,3,…n) (9)
Wherein e ismIs a fixed length of embedded vector, rho, obtained by pooling all the individuals on averageβIs a multi-layer perceptron with activation functions;
then will beGiving pairwise interaction vectors hiAnd corresponding attention scores betaiThe final calculated population is represented by a weighted linear combination of all pairs:
Figure BDA0003084322280000033
the planning module is used for estimating the joint state value of the robot and the crowd in the navigation process:
v=gv(St,Ct;Wv) (11)
wherein, gvIs a multi-layer sensor with activation function, WvIs the network weight.
Still further, in the step 3), the process of cycling each event is as follows:
initializing a random joint state KtLooping each step of each event, selecting a random action a using the probability εtAnd if the small probability event does not occur, selecting the action with the maximum current value function by using a greedy strategy:
Figure BDA0003084322280000041
continuously updating the current state and the reward value, storing the state and the reward value into an experience playback pool, updating the experience pool once every 3000 steps, updating the current value network by a gradient descent method until the robot reaches the final state, ending the inner loop of each event, updating the current network into a target network, and obtaining a value network model V after the number of events is reached.
Furthermore, in the step 4), a speed screening mechanism based on a map is added to form a safe action space, so that the robot can avoid known obstacles in the environment, and in the process of each decision, the safe action space is determined by the current position p of the robottThe two-dimensional grid map M and the initialized action space constitute A, namely Asafe=(ptM, A), for each speed in the motion space, positiveTo the simulation, it is observed whether the robot will collide with obstacles in the map.
In the step 5), on the two-dimensional grid map, a global path with the minimum cost is generated between the current position of the robot and the global target by using an RRT algorithm, then all path points on the global path are traversed, the nearest point is found in a circle with the current position of the robot as the center of the circle and r as the radius, and the nearest point is set as the dynamic local target.
The invention has the following beneficial effects: (1) by introducing the multilayer perceptron, the training time and the convergence speed of reinforcement learning are accelerated; (2) the trained model can predict the human dynamics, and solves the problems of short sight, slow response and the like of the robot in the navigation obstacle avoidance process; (3) a dynamic local target mechanism is designed, so that the robot searches for an optimal path in local planning, and the time of a navigation obstacle avoidance process is reduced; (4) introducing an action screening mechanism based on a map as a dynamic safety action space;
drawings
FIG. 1 is a flow chart of a robot navigation obstacle avoidance method;
FIG. 2 is a value network training flow diagram;
FIG. 3 is a graph of training simulation results.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a mobile robot navigation obstacle avoidance method based on deep reinforcement learning includes the following steps:
step 1): in a two-dimensional space, considering the robot and each person as a circle, one robot moves to a target point through n persons. For each agent (robot or human), position p ═ px,py]Velocity v ═ vx,vy]And radius r can be observed by other agents. Target position g ═ gx,gy]Azimuth theta and preferred velocity vprefNot observable by other agents. StWhich indicates the state of the robot and,
Figure BDA0003084322280000042
indicating the state of the person at time t. By connecting the state of the robot with the observable state of the human, the joint states K of all the n +1 agents at time t can be obtainedt=[St,Ot]. The robot may determine that the motion command can immediately change speed at time t according to the navigation strategy: v. oft=at=π(Kt)。
Navigating pi using optimal strategies*(Kt) At time t, joint state KtThe optimum values of (a) are as follows:
Figure BDA0003084322280000051
in the formula (13), T represents the sum of steps from the state at time T to the final state, Δ T represents the time interval between two actions, γ ∈ (0,1) represents the attenuation factor, v ∈ (1) represents the attenuation factor, andprefindicating a preferred speed, R (K)t,at) Indicating the corresponding prize during time t.
The optimal strategy is established by the maximum accumulated return and is defined as follows:
Figure BDA0003084322280000052
in the formula (5), A represents a set of motion spaces,
Figure BDA0003084322280000053
indicating the optimum value at time t + deltat.
The reward and penalty function is defined as follows:
Figure BDA0003084322280000054
in the formula (3), dminRepresents the minimum separation distance between the robot and the person within the time of delta t, dcomfRepresents a comfortable distance that a person can endure, dgoalIndicating the distance from the current position of the robot to the target point.
Step 2: the value network model comprises three parts of an interaction module, a pooling module and a planning module. The interaction module is used for modeling the human and the robot and coding the human and human interaction through coarse-grained local mapping; the pooling module aggregates the interaction into embedded vectors with fixed length through a self-attention mechanism, and learns the relative importance of each person and the collective influence of the crowd in a data-driven manner; the planning module is used for estimating the joint state value of the robot and the crowd in the navigation process. The method comprises the following specific steps:
step 2.1: embedding the state of the ith person and the state of the robot into a vector e with a fixed length by using a multilayer perceptroniThe method comprises the following steps:
ei=ψe(St,Ot;We)(i=1,2,3,…n) (6)
in formula (16), phie(. is an inline function, WeAre the embedding weights.
Step 2.2: to embed vector eiInputting the data into a subsequent multilayer perceptron to obtain pairwise interaction characteristics of the robot and the human:
hi=φh(ei;Wh)(i=1,2,3,…n) (7)
in the formula (17), phih(. is a fully connected layer, WhIs the network weight.
Step 2.3: embedding interactions into eiConversion to attention score βi
Figure BDA0003084322280000061
βi=ρβ(ei,em;Wβ)(i=1,2,3,…n) (9)
In the formula (19), emIs a fixed length of embedded vector, rho, obtained by pooling all the individuals on averageβ(. cndot.) is a multilayer perceptron with activation functions.
Step 2.4: for each person, give two by twoInteraction vector hiAnd corresponding attention scores betaiThe final population is represented by a weighted linear combination of all pairs:
Figure BDA0003084322280000062
and step 3: as shown in fig. 2, the value network is trained using the time difference method in reinforcement learning. Recording the value network V as the current value network, setting the initial training frequency to be 0, setting the capacity of an experience playback pool to be 50000, setting the sampling number to be 100, setting a target network V', initializing the random joint state, setting the training frequency to be 10000, and setting the state to be s according to an epsilon greedy strategytSelecting an action:
Figure BDA0003084322280000063
get a return rtAnd the next state st', at state st' obtaining a according to the greedy strategy of epsilontStoring the updated return value and state into an experience pool, updating the experience pool once every 3000 steps, and updating the current value network by a gradient descent method until the robot reaches the final state or exceeds the set maximum time tmaxAnd the time is 25s, otherwise, the current network is updated to the target network, and when the training times are reached, the value network V is obtained.
And 4, step 4: a two-dimensional grid map is established through a laser radar, a global path is planned through an RRT algorithm in the navigation process, then a deep reinforcement learning method is used for achieving local dynamic obstacle avoidance, the robot can select actions according to an optimal strategy through a trained value network V, the radius of the robot and the robot is set to be 0.3m, the preferred speed of the robot is set to be 0.25m/s, the minimum comfortable distance of the robot is set to be 0.5m, a dynamic local target is set to be 4m, and when no dynamic obstacle exists in an environment space, the robot can directly move towards a target point. When a dynamic barrier exists, the robot can quickly and safely avoid the barrier. The navigation time and efficiency of the mobile robot can be effectively improved by introducing dynamic local targets and a safety mechanism.
The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.

Claims (5)

1. A mobile robot navigation obstacle avoidance method based on deep reinforcement learning is characterized by comprising the following steps:
1) training a value network model by adopting a time difference method in deep learning middle and deep layer cyclic neural network and reinforcement learning so as to realize robot navigation obstacle avoidance;
2) simplifying the robot and each person into a circle, and defining the state of the robot at the time t:
St=[px,py,v,θ,gx,gy,r,vpref] (1)
wherein p isx,pyRepresenting the current position of the robot, v representing the current velocity of the robot, theta representing the azimuth angle of the robot, gx,gyRepresenting the target position of the robot, r the radius of the robot, vprefRepresenting a preferred speed of the robot;
defining each person's state at time t:
Figure FDA0003084322270000011
defining a reward and penalty function:
Figure FDA0003084322270000012
wherein, at=vtIndicating the movement of the robot, dminIndicating that the robot is in contact with the person during the time Δ tAt a minimum spacing distance, dcomfRepresents a comfortable distance that a person can endure, dgoalRepresenting the distance from the current position of the robot to the target point;
3) will StAnd OtInputting the data into a deep circulation neural network with initial weight, simulating a navigation strategy of a human expert by the robot to obtain demonstration experience D, storing the demonstration experience D into an initialized experience pool E, initializing a value network V by using a random weight value theta, initializing a target value network V' into a current value network V, and circulating each event to obtain an optimal value network V;
4) establishing a two-dimensional grid map, setting a global target point, and continuously updating the joint state of the robot and the human by using a pre-trained value network V:
Kt=[St,Ot] (4)
5) then, an RRT algorithm is utilized to plan a global optimal path and make an optimal strategy
Figure FDA0003084322270000013
Where A represents a set of motion spaces, γ ∈ (0,1) represents a decay factor, Δ t represents the time interval between two motions, vprefIndicating the preferred speed, V (K)t+Δt) Represents the optimal value at time t + Δ t;
6) selecting optimal action a through optimal strategytI.e. the optimum speed vtAnd local obstacle avoidance is realized until the robot reaches the position of the target point.
2. The robot navigation obstacle avoidance method based on the deep reinforcement learning as claimed in claim 1, characterized in that: in the step 1), the value network model consists of an interaction module, a pooling module and a planning module, wherein the interaction module uses a multilayer perceptron to embed the ith personal state and the robot state into a vector e with a fixed lengthi
ei=ψe(St,Ot;We)(i=1,2,3,…n) (6)
Wherein psieIs a multi-layered perceptron with activation functions for modeling human and robot interactions, WeIs the embedding weight;
then embedding vector eiInputting into a subsequent multilayer perceptron:
hi=φh(ei;Wh)(i=1,2,3,…n) (7)
wherein phi ishIs a full connection layer with a nonlinear activation function to obtain the interaction characteristics of the robot and the ith person, WhIs the network weight;
the pooling module first embeds the interaction into a vector eiConversion to attention score βi
Figure FDA0003084322270000021
βi=ρβ(ei,em;Wβ)(i=1,2,3,…n) (9)
Wherein e ismIs a fixed length of embedded vector, rho, obtained by pooling all the individuals on averageβIs a multi-layer perceptron with activation functions;
then will give two-by-two interaction vector hiAnd corresponding attention scores betaiThe final calculated population is represented by a weighted linear combination of all pairs:
Figure FDA0003084322270000022
the planning module is used for estimating the joint state value of the robot and the crowd in the navigation process:
v=gv(St,Ct;Wv) (11)
wherein, gvIs a multi-layer sensor with activation function, WvIs the network weight.
3. The robot navigation obstacle avoidance method based on the deep reinforcement learning as claimed in claim 1 or 2, characterized in that: in the step 3), the following events are circulated:
initializing a random joint state KtLooping each step of each event, selecting a random action a using the probability εtAnd if the small probability event does not occur, selecting the action with the maximum current value function by using a greedy strategy:
Figure FDA0003084322270000023
continuously updating the current state and the reward value, storing the state and the reward value into an experience playback pool, updating the experience pool once every 3000 steps, updating the current value network by a gradient descent method until the robot reaches the final state, ending the inner loop of each event, updating the current network into a target network, and obtaining a value network model V after the number of events is reached.
4. The robot navigation obstacle avoidance method based on the deep reinforcement learning as claimed in claim 1 or 2, characterized in that: in the step 4), a speed screening mechanism based on a map is added to form a safe action space, so that the robot can avoid known obstacles in the environment, and in the process of each decision, the safe action space is defined by the current position p of the robottThe two-dimensional grid map M and the initialized action space constitute A, namely Asafe=(ptM, a), for each speed in the motion space, a forward simulation is performed to see if the robot will collide with an obstacle in the map.
5. The robot navigation obstacle avoidance method based on the deep reinforcement learning as claimed in claim 1 or 2, characterized in that: in the step 5), on the two-dimensional grid map, a global path with the minimum cost is generated between the current position of the robot and the global target by using an RRT algorithm, then all path points on the global path are traversed, the nearest point is found in a circle with the current position of the robot as the center of the circle and r as the radius, and the nearest point is set as the dynamic local target.
CN202110575846.8A 2021-05-26 2021-05-26 Mobile robot navigation obstacle avoidance method based on deep reinforcement learning Active CN113359717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110575846.8A CN113359717B (en) 2021-05-26 2021-05-26 Mobile robot navigation obstacle avoidance method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110575846.8A CN113359717B (en) 2021-05-26 2021-05-26 Mobile robot navigation obstacle avoidance method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113359717A true CN113359717A (en) 2021-09-07
CN113359717B CN113359717B (en) 2022-07-26

Family

ID=77527872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110575846.8A Active CN113359717B (en) 2021-05-26 2021-05-26 Mobile robot navigation obstacle avoidance method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113359717B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237235A (en) * 2021-12-02 2022-03-25 之江实验室 Mobile robot obstacle avoidance method based on deep reinforcement learning
CN114384920A (en) * 2022-03-23 2022-04-22 安徽大学 Dynamic obstacle avoidance method based on real-time construction of local grid map
CN114485673A (en) * 2022-02-09 2022-05-13 山东大学 Service robot crowd perception navigation method and system based on deep reinforcement learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109822579A (en) * 2019-04-10 2019-05-31 江苏艾萨克机器人股份有限公司 Cooperation robot security's control method of view-based access control model
CN110125943A (en) * 2019-06-27 2019-08-16 易思维(杭州)科技有限公司 Multi-degree-of-freemechanical mechanical arm obstacle-avoiding route planning method
CN110244734A (en) * 2019-06-20 2019-09-17 中山大学 A kind of automatic driving vehicle paths planning method based on depth convolutional neural networks
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment
CN110703768A (en) * 2019-11-08 2020-01-17 福州大学 Improved dynamic RRT mobile robot motion planning method
CN111596668A (en) * 2020-06-17 2020-08-28 苏州大学 Mobile robot anthropomorphic path planning method based on reverse reinforcement learning
CN111844007A (en) * 2020-06-02 2020-10-30 江苏理工学院 Pollination robot mechanical arm obstacle avoidance path planning method and device
CN112179367A (en) * 2020-09-25 2021-01-05 广东海洋大学 Intelligent autonomous navigation method based on deep reinforcement learning
CN112631173A (en) * 2020-12-11 2021-04-09 中国人民解放军国防科技大学 Brain-controlled unmanned platform cooperative control system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109822579A (en) * 2019-04-10 2019-05-31 江苏艾萨克机器人股份有限公司 Cooperation robot security's control method of view-based access control model
CN110244734A (en) * 2019-06-20 2019-09-17 中山大学 A kind of automatic driving vehicle paths planning method based on depth convolutional neural networks
CN110125943A (en) * 2019-06-27 2019-08-16 易思维(杭州)科技有限公司 Multi-degree-of-freemechanical mechanical arm obstacle-avoiding route planning method
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment
CN110703768A (en) * 2019-11-08 2020-01-17 福州大学 Improved dynamic RRT mobile robot motion planning method
CN111844007A (en) * 2020-06-02 2020-10-30 江苏理工学院 Pollination robot mechanical arm obstacle avoidance path planning method and device
CN111596668A (en) * 2020-06-17 2020-08-28 苏州大学 Mobile robot anthropomorphic path planning method based on reverse reinforcement learning
CN112179367A (en) * 2020-09-25 2021-01-05 广东海洋大学 Intelligent autonomous navigation method based on deep reinforcement learning
CN112631173A (en) * 2020-12-11 2021-04-09 中国人民解放军国防科技大学 Brain-controlled unmanned platform cooperative control system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237235A (en) * 2021-12-02 2022-03-25 之江实验室 Mobile robot obstacle avoidance method based on deep reinforcement learning
CN114237235B (en) * 2021-12-02 2024-01-19 之江实验室 Mobile robot obstacle avoidance method based on deep reinforcement learning
CN114485673A (en) * 2022-02-09 2022-05-13 山东大学 Service robot crowd perception navigation method and system based on deep reinforcement learning
CN114485673B (en) * 2022-02-09 2023-11-03 山东大学 Service robot crowd sensing navigation method and system based on deep reinforcement learning
CN114384920A (en) * 2022-03-23 2022-04-22 安徽大学 Dynamic obstacle avoidance method based on real-time construction of local grid map
US11720110B2 (en) 2022-03-23 2023-08-08 Anhui University Dynamic obstacle avoidance method based on real-time local grid map construction

Also Published As

Publication number Publication date
CN113359717B (en) 2022-07-26

Similar Documents

Publication Publication Date Title
CN113359717B (en) Mobile robot navigation obstacle avoidance method based on deep reinforcement learning
Cai ROBOTICS: From Manipulator to Mobilebot
JP6854549B2 (en) AUV action planning and motion control methods based on reinforcement learning
Cao et al. Multi-AUV target search based on bioinspired neurodynamics model in 3-D underwater environments
Parker Cooperative robotics for multi-target observation
Velagic et al. A 3-level autonomous mobile robot navigation system designed by using reasoning/search approaches
Zalama et al. Adaptive behavior navigation of a mobile robot
Low et al. A hybrid mobile robot architecture with integrated planning and control
Botteghi et al. On reward shaping for mobile robot navigation: A reinforcement learning and SLAM based approach
Modayil et al. Autonomous development of a grounded object ontology by a learning robot
Kazem et al. Modified vector field histogram with a neural network learning model for mobile robot path planning and obstacle avoidance.
Liu et al. Episodic memory-based robotic planning under uncertainty
Tung et al. Socially aware robot navigation using deep reinforcement learning
Malviya et al. Autonomous social robot navigation using a behavioral finite state social machine
Sebastian et al. Neural network based heterogeneous sensor fusion for robot motion planning
Azouaoui et al. Soft‐computing based navigation approach for a bi‐steerable mobile robot
Gavrilov et al. Mobile robot navigation using reinforcement learning based on neural network with short term memory
Pandey et al. Trajectory Planning and Collision Control of a Mobile Robot: A Penalty-Based PSO Approach
Kondratenko et al. Safe Navigation of an Autonomous Robot in Dynamic and Unknown Environments
De Villiers et al. Learning fine-grained control for mapless navigation
de Almeida Afonso et al. Autonomous robot navigation in crowd
Springer et al. Simple strategies for collision-avoidance in robot soccer
Al Arafat et al. Neural network-based obstacle and pothole avoiding robot
Beom et al. Behavioral control in mobile robot navigation using fuzzy decision making approach
Song et al. Robot Navigation in Crowd via DeepReinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant