CN113359717A - Mobile robot navigation obstacle avoidance method based on deep reinforcement learning - Google Patents
Mobile robot navigation obstacle avoidance method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113359717A CN113359717A CN202110575846.8A CN202110575846A CN113359717A CN 113359717 A CN113359717 A CN 113359717A CN 202110575846 A CN202110575846 A CN 202110575846A CN 113359717 A CN113359717 A CN 113359717A
- Authority
- CN
- China
- Prior art keywords
- robot
- obstacle avoidance
- network
- reinforcement learning
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000002787 reinforcement Effects 0.000 title claims abstract description 25
- 230000009471 action Effects 0.000 claims abstract description 22
- 230000003993 interaction Effects 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims abstract description 19
- 241000282414 Homo sapiens Species 0.000 claims abstract description 17
- 230000008569 process Effects 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 10
- 230000007246 mechanism Effects 0.000 claims abstract description 9
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 230000033001 locomotion Effects 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 9
- 238000013135 deep learning Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 2
- 230000004044 response Effects 0.000 abstract description 3
- 230000004931 aggregating effect Effects 0.000 abstract 1
- 239000003795 chemical substances by application Substances 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000004888 barrier function Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0214—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
Abstract
A mobile robot navigation obstacle avoidance method based on deep reinforcement learning is characterized in that a neural network is utilized to extract characteristics of paired interaction between a robot and each person, and interaction between the persons is captured through a local map; a self-attention mechanism is used for aggregating interaction features to deduce the relative importance of adjacent human beings relative to the future state of the adjacent human beings, a reinforcement learning method is used for pre-training a value network, and a safety action space is added in the training process to prevent the occurrence of emergency; establishing a two-dimensional grid map, planning a global path by setting global target points by using an RRT (remote distance transform) algorithm, finding the nearest point in a circle with the current position of the robot as the center of the circle and r as the radius, setting the nearest point as a dynamic local target, and selecting the optimal action through an optimal strategy to finally realize the navigation obstacle avoidance function of the mobile robot. The invention solves the problems of short sight, slow response and the like of the robot in the existing navigation obstacle avoidance process.
Description
Technical Field
The invention relates to a navigation obstacle avoidance method of a mobile robot, in particular to a navigation obstacle avoidance method of a mobile robot based on the combination of deep learning and reinforcement learning, and belongs to the field of mobile robots.
Background
The robot is equipment integrating multiple fields of technology intersection, such as machinery, electronics, computers, control, artificial intelligence and the like, particularly a mobile robot, and is widely applied to various fields of human production and life due to the characteristics of autonomous control and flexible motion. The mobile robot refers to an intelligent robot which can detect the external environment through a sensor and realize continuous and autonomous driving in a complex environment, relates to the subject fields of information perception, motion planning, autonomous control and the like, and is the latest achievement of artificial intelligence technology and computer information science.
The most important capability of the mobile robot is navigation, namely the mobile robot can avoid obstacles in a working range space and realize safe movement from an initial position to a target position. In recent years, the application of the mobile robot has been expanded to outdoor unknown environments such as deep sea, polar regions and the like, and higher requirements are put forward on the navigation function of the mobile robot. The autonomous navigation and motion control technology is the key for solving the problem of track planning motion of the mobile robot under the condition of unknown non-structural and environmental conditions, so that the method has important theoretical value and application value for the research of the navigation control method of the mobile robot.
Deep learning and reinforcement learning are two important branches in the field of machine learning, and are always taken as a research hotspot by scholars at home and abroad, particularly in the field of mobile robots.
The learning process of reinforcement learning is a dynamic, constantly interactive process, and the required data is also generated by constantly interacting with the environment. The reinforcement learning involves many objects, such as actions, environments, state transition probabilities, and reward functions. In addition, deep learning such as image recognition and speech recognition solves the perception problem, and reinforcement learning solves the decision problem. Therefore, the deep reinforcement learning algorithm generated by combining the developed deep learning technology and the reinforcement learning algorithm is taken as the development trend of the artificial intelligence in the future.
In reality, the mobile robot is generally an environment in which people, machines and objects coexist, and in the environment, a plurality of moving bodies, people and other equipment need to perform trajectory planning movement in a narrow and crowded environment, so that the mobile robot is required to be capable of avoiding obstacles in a crowded scene. As humans, we have the inherent ability to adjust their behavior by observing others, so we can easily pass through people or other objects. However, performing collision-free navigation in dynamic and crowded scenarios is still a difficult task for mobile robots. In conventional mobile robot navigation methods, a moving intelligent body is generally regarded as a static obstacle or a next action is performed according to a specific interaction rule, and the conventional methods prevent collision through passive reaction or manually define a function to ensure safety, so that the problems of short sight, slow reaction, insecurity and the like of a robot are caused.
Disclosure of Invention
In order to overcome the defects of the prior art and based on deep learning and reinforcement learning research, the invention provides a mobile robot navigation obstacle avoidance method based on deep reinforcement learning, which can predict the dynamics of human beings, solve the problems of short sight, slow response and the like of a robot in the navigation obstacle avoidance process, and add a safety mechanism to prevent sudden situations from happening in the navigation process. And finally, a dynamic local target mechanism is designed, so that the time of the navigation obstacle avoidance process can be reduced.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a mobile robot navigation obstacle avoidance method based on deep reinforcement learning comprises the following steps
1) Training a value network model by adopting a time difference method in deep learning middle and deep layer cyclic neural network and reinforcement learning so as to realize robot navigation obstacle avoidance;
2) simplifying the robot and each person into a circle, and defining the state of the robot at the time t:
St=[px,py,v,θ,gx,gy,r,vpref] (1)
wherein p isx,pyShowing machineThe current position of the robot, v represents the current speed of the robot, theta represents the azimuth angle of the robot, and gx,gyRepresenting the target position of the robot, r the radius of the robot, vprefRepresenting a preferred speed of the robot;
defining each person's state at time t:
defining a reward and penalty function:
wherein, at=vtIndicating the movement of the robot, dminRepresenting the minimum separation distance, d, between the robot and the person during the time Δ tcomfRepresents a comfortable distance that a person can endure, dgoalRepresenting the distance from the current position of the robot to the target point;
3) will StAnd OtInputting the data into a deep circulation neural network with initial weight, simulating a navigation strategy of a human expert by the robot to obtain demonstration experience D, storing the demonstration experience D into an initialized experience pool E, initializing a value network V by using a random weight value theta, initializing a target value network V' into a current value network V, and circulating each event to obtain an optimal value network V;
4) establishing a two-dimensional grid map, setting a global target point, and continuously updating the joint state of the robot and the human by using a pre-trained value network V:
Kt=[St,Ot] (4)
5) then, an RRT algorithm is utilized to plan a global optimal path and make an optimal strategy
Wherein A represents a set of motion spacesIn sum, γ ∈ (0,1) denotes the attenuation factor, Δ t denotes the time interval between two actions, vprefIndicates the preferred speed, V*(Kt+Δt) Represents the optimal value at time t + Δ t;
6) selecting optimal action a through optimal strategytI.e. the optimum speed vtAnd local obstacle avoidance is realized until the robot reaches the position of the target point.
Further, in the step 1), the value network model consists of an interaction module, a pooling module and a planning module, wherein the interaction module uses a multilayer perceptron to embed the state of each person and the state of the robot into a vector e with a fixed lengthi:
ei=ψe(St,Ot;We)(i=1,2,3,…n) (6)
Wherein psieIs a multi-layered perceptron with activation functions for modeling human and robot interactions, WeIs the embedding weight;
then embedding vector eiInputting into a subsequent multilayer perceptron:
hi=φh(ei;Wh)(i=1,2,3,…n) (7)
wherein phi ishIs a full connection layer with a nonlinear activation function to obtain the interaction characteristics of the robot and the ith person, WhIs the network weight;
the pooling module first embeds the interaction into a vector eiConversion to attention score βi:
βi=ρβ(ei,em;Wβ)(i=1,2,3,…n) (9)
Wherein e ismIs a fixed length of embedded vector, rho, obtained by pooling all the individuals on averageβIs a multi-layer perceptron with activation functions;
then will beGiving pairwise interaction vectors hiAnd corresponding attention scores betaiThe final calculated population is represented by a weighted linear combination of all pairs:
the planning module is used for estimating the joint state value of the robot and the crowd in the navigation process:
v=gv(St,Ct;Wv) (11)
wherein, gvIs a multi-layer sensor with activation function, WvIs the network weight.
Still further, in the step 3), the process of cycling each event is as follows:
initializing a random joint state KtLooping each step of each event, selecting a random action a using the probability εtAnd if the small probability event does not occur, selecting the action with the maximum current value function by using a greedy strategy:
continuously updating the current state and the reward value, storing the state and the reward value into an experience playback pool, updating the experience pool once every 3000 steps, updating the current value network by a gradient descent method until the robot reaches the final state, ending the inner loop of each event, updating the current network into a target network, and obtaining a value network model V after the number of events is reached.
Furthermore, in the step 4), a speed screening mechanism based on a map is added to form a safe action space, so that the robot can avoid known obstacles in the environment, and in the process of each decision, the safe action space is determined by the current position p of the robottThe two-dimensional grid map M and the initialized action space constitute A, namely Asafe=(ptM, A), for each speed in the motion space, positiveTo the simulation, it is observed whether the robot will collide with obstacles in the map.
In the step 5), on the two-dimensional grid map, a global path with the minimum cost is generated between the current position of the robot and the global target by using an RRT algorithm, then all path points on the global path are traversed, the nearest point is found in a circle with the current position of the robot as the center of the circle and r as the radius, and the nearest point is set as the dynamic local target.
The invention has the following beneficial effects: (1) by introducing the multilayer perceptron, the training time and the convergence speed of reinforcement learning are accelerated; (2) the trained model can predict the human dynamics, and solves the problems of short sight, slow response and the like of the robot in the navigation obstacle avoidance process; (3) a dynamic local target mechanism is designed, so that the robot searches for an optimal path in local planning, and the time of a navigation obstacle avoidance process is reduced; (4) introducing an action screening mechanism based on a map as a dynamic safety action space;
drawings
FIG. 1 is a flow chart of a robot navigation obstacle avoidance method;
FIG. 2 is a value network training flow diagram;
FIG. 3 is a graph of training simulation results.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a mobile robot navigation obstacle avoidance method based on deep reinforcement learning includes the following steps:
step 1): in a two-dimensional space, considering the robot and each person as a circle, one robot moves to a target point through n persons. For each agent (robot or human), position p ═ px,py]Velocity v ═ vx,vy]And radius r can be observed by other agents. Target position g ═ gx,gy]Azimuth theta and preferred velocity vprefNot observable by other agents. StWhich indicates the state of the robot and,indicating the state of the person at time t. By connecting the state of the robot with the observable state of the human, the joint states K of all the n +1 agents at time t can be obtainedt=[St,Ot]. The robot may determine that the motion command can immediately change speed at time t according to the navigation strategy: v. oft=at=π(Kt)。
Navigating pi using optimal strategies*(Kt) At time t, joint state KtThe optimum values of (a) are as follows:
in the formula (13), T represents the sum of steps from the state at time T to the final state, Δ T represents the time interval between two actions, γ ∈ (0,1) represents the attenuation factor, v ∈ (1) represents the attenuation factor, andprefindicating a preferred speed, R (K)t,at) Indicating the corresponding prize during time t.
The optimal strategy is established by the maximum accumulated return and is defined as follows:
in the formula (5), A represents a set of motion spaces,indicating the optimum value at time t + deltat.
The reward and penalty function is defined as follows:
in the formula (3), dminRepresents the minimum separation distance between the robot and the person within the time of delta t, dcomfRepresents a comfortable distance that a person can endure, dgoalIndicating the distance from the current position of the robot to the target point.
Step 2: the value network model comprises three parts of an interaction module, a pooling module and a planning module. The interaction module is used for modeling the human and the robot and coding the human and human interaction through coarse-grained local mapping; the pooling module aggregates the interaction into embedded vectors with fixed length through a self-attention mechanism, and learns the relative importance of each person and the collective influence of the crowd in a data-driven manner; the planning module is used for estimating the joint state value of the robot and the crowd in the navigation process. The method comprises the following specific steps:
step 2.1: embedding the state of the ith person and the state of the robot into a vector e with a fixed length by using a multilayer perceptroniThe method comprises the following steps:
ei=ψe(St,Ot;We)(i=1,2,3,…n) (6)
in formula (16), phie(. is an inline function, WeAre the embedding weights.
Step 2.2: to embed vector eiInputting the data into a subsequent multilayer perceptron to obtain pairwise interaction characteristics of the robot and the human:
hi=φh(ei;Wh)(i=1,2,3,…n) (7)
in the formula (17), phih(. is a fully connected layer, WhIs the network weight.
Step 2.3: embedding interactions into eiConversion to attention score βi:
βi=ρβ(ei,em;Wβ)(i=1,2,3,…n) (9)
In the formula (19), emIs a fixed length of embedded vector, rho, obtained by pooling all the individuals on averageβ(. cndot.) is a multilayer perceptron with activation functions.
Step 2.4: for each person, give two by twoInteraction vector hiAnd corresponding attention scores betaiThe final population is represented by a weighted linear combination of all pairs:
and step 3: as shown in fig. 2, the value network is trained using the time difference method in reinforcement learning. Recording the value network V as the current value network, setting the initial training frequency to be 0, setting the capacity of an experience playback pool to be 50000, setting the sampling number to be 100, setting a target network V', initializing the random joint state, setting the training frequency to be 10000, and setting the state to be s according to an epsilon greedy strategytSelecting an action:
get a return rtAnd the next state st', at state st' obtaining a according to the greedy strategy of epsilontStoring the updated return value and state into an experience pool, updating the experience pool once every 3000 steps, and updating the current value network by a gradient descent method until the robot reaches the final state or exceeds the set maximum time tmaxAnd the time is 25s, otherwise, the current network is updated to the target network, and when the training times are reached, the value network V is obtained.
And 4, step 4: a two-dimensional grid map is established through a laser radar, a global path is planned through an RRT algorithm in the navigation process, then a deep reinforcement learning method is used for achieving local dynamic obstacle avoidance, the robot can select actions according to an optimal strategy through a trained value network V, the radius of the robot and the robot is set to be 0.3m, the preferred speed of the robot is set to be 0.25m/s, the minimum comfortable distance of the robot is set to be 0.5m, a dynamic local target is set to be 4m, and when no dynamic obstacle exists in an environment space, the robot can directly move towards a target point. When a dynamic barrier exists, the robot can quickly and safely avoid the barrier. The navigation time and efficiency of the mobile robot can be effectively improved by introducing dynamic local targets and a safety mechanism.
The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.
Claims (5)
1. A mobile robot navigation obstacle avoidance method based on deep reinforcement learning is characterized by comprising the following steps:
1) training a value network model by adopting a time difference method in deep learning middle and deep layer cyclic neural network and reinforcement learning so as to realize robot navigation obstacle avoidance;
2) simplifying the robot and each person into a circle, and defining the state of the robot at the time t:
St=[px,py,v,θ,gx,gy,r,vpref] (1)
wherein p isx,pyRepresenting the current position of the robot, v representing the current velocity of the robot, theta representing the azimuth angle of the robot, gx,gyRepresenting the target position of the robot, r the radius of the robot, vprefRepresenting a preferred speed of the robot;
defining each person's state at time t:
defining a reward and penalty function:
wherein, at=vtIndicating the movement of the robot, dminIndicating that the robot is in contact with the person during the time Δ tAt a minimum spacing distance, dcomfRepresents a comfortable distance that a person can endure, dgoalRepresenting the distance from the current position of the robot to the target point;
3) will StAnd OtInputting the data into a deep circulation neural network with initial weight, simulating a navigation strategy of a human expert by the robot to obtain demonstration experience D, storing the demonstration experience D into an initialized experience pool E, initializing a value network V by using a random weight value theta, initializing a target value network V' into a current value network V, and circulating each event to obtain an optimal value network V;
4) establishing a two-dimensional grid map, setting a global target point, and continuously updating the joint state of the robot and the human by using a pre-trained value network V:
Kt=[St,Ot] (4)
5) then, an RRT algorithm is utilized to plan a global optimal path and make an optimal strategy
Where A represents a set of motion spaces, γ ∈ (0,1) represents a decay factor, Δ t represents the time interval between two motions, vprefIndicating the preferred speed, V (K)t+Δt) Represents the optimal value at time t + Δ t;
6) selecting optimal action a through optimal strategytI.e. the optimum speed vtAnd local obstacle avoidance is realized until the robot reaches the position of the target point.
2. The robot navigation obstacle avoidance method based on the deep reinforcement learning as claimed in claim 1, characterized in that: in the step 1), the value network model consists of an interaction module, a pooling module and a planning module, wherein the interaction module uses a multilayer perceptron to embed the ith personal state and the robot state into a vector e with a fixed lengthi:
ei=ψe(St,Ot;We)(i=1,2,3,…n) (6)
Wherein psieIs a multi-layered perceptron with activation functions for modeling human and robot interactions, WeIs the embedding weight;
then embedding vector eiInputting into a subsequent multilayer perceptron:
hi=φh(ei;Wh)(i=1,2,3,…n) (7)
wherein phi ishIs a full connection layer with a nonlinear activation function to obtain the interaction characteristics of the robot and the ith person, WhIs the network weight;
the pooling module first embeds the interaction into a vector eiConversion to attention score βi:
βi=ρβ(ei,em;Wβ)(i=1,2,3,…n) (9)
Wherein e ismIs a fixed length of embedded vector, rho, obtained by pooling all the individuals on averageβIs a multi-layer perceptron with activation functions;
then will give two-by-two interaction vector hiAnd corresponding attention scores betaiThe final calculated population is represented by a weighted linear combination of all pairs:
the planning module is used for estimating the joint state value of the robot and the crowd in the navigation process:
v=gv(St,Ct;Wv) (11)
wherein, gvIs a multi-layer sensor with activation function, WvIs the network weight.
3. The robot navigation obstacle avoidance method based on the deep reinforcement learning as claimed in claim 1 or 2, characterized in that: in the step 3), the following events are circulated:
initializing a random joint state KtLooping each step of each event, selecting a random action a using the probability εtAnd if the small probability event does not occur, selecting the action with the maximum current value function by using a greedy strategy:
continuously updating the current state and the reward value, storing the state and the reward value into an experience playback pool, updating the experience pool once every 3000 steps, updating the current value network by a gradient descent method until the robot reaches the final state, ending the inner loop of each event, updating the current network into a target network, and obtaining a value network model V after the number of events is reached.
4. The robot navigation obstacle avoidance method based on the deep reinforcement learning as claimed in claim 1 or 2, characterized in that: in the step 4), a speed screening mechanism based on a map is added to form a safe action space, so that the robot can avoid known obstacles in the environment, and in the process of each decision, the safe action space is defined by the current position p of the robottThe two-dimensional grid map M and the initialized action space constitute A, namely Asafe=(ptM, a), for each speed in the motion space, a forward simulation is performed to see if the robot will collide with an obstacle in the map.
5. The robot navigation obstacle avoidance method based on the deep reinforcement learning as claimed in claim 1 or 2, characterized in that: in the step 5), on the two-dimensional grid map, a global path with the minimum cost is generated between the current position of the robot and the global target by using an RRT algorithm, then all path points on the global path are traversed, the nearest point is found in a circle with the current position of the robot as the center of the circle and r as the radius, and the nearest point is set as the dynamic local target.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110575846.8A CN113359717B (en) | 2021-05-26 | 2021-05-26 | Mobile robot navigation obstacle avoidance method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110575846.8A CN113359717B (en) | 2021-05-26 | 2021-05-26 | Mobile robot navigation obstacle avoidance method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113359717A true CN113359717A (en) | 2021-09-07 |
CN113359717B CN113359717B (en) | 2022-07-26 |
Family
ID=77527872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110575846.8A Active CN113359717B (en) | 2021-05-26 | 2021-05-26 | Mobile robot navigation obstacle avoidance method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113359717B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114237235A (en) * | 2021-12-02 | 2022-03-25 | 之江实验室 | Mobile robot obstacle avoidance method based on deep reinforcement learning |
CN114384920A (en) * | 2022-03-23 | 2022-04-22 | 安徽大学 | Dynamic obstacle avoidance method based on real-time construction of local grid map |
CN114485673A (en) * | 2022-02-09 | 2022-05-13 | 山东大学 | Service robot crowd perception navigation method and system based on deep reinforcement learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109822579A (en) * | 2019-04-10 | 2019-05-31 | 江苏艾萨克机器人股份有限公司 | Cooperation robot security's control method of view-based access control model |
CN110125943A (en) * | 2019-06-27 | 2019-08-16 | 易思维(杭州)科技有限公司 | Multi-degree-of-freemechanical mechanical arm obstacle-avoiding route planning method |
CN110244734A (en) * | 2019-06-20 | 2019-09-17 | 中山大学 | A kind of automatic driving vehicle paths planning method based on depth convolutional neural networks |
CN110632931A (en) * | 2019-10-09 | 2019-12-31 | 哈尔滨工程大学 | Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment |
CN110703768A (en) * | 2019-11-08 | 2020-01-17 | 福州大学 | Improved dynamic RRT mobile robot motion planning method |
CN111596668A (en) * | 2020-06-17 | 2020-08-28 | 苏州大学 | Mobile robot anthropomorphic path planning method based on reverse reinforcement learning |
CN111844007A (en) * | 2020-06-02 | 2020-10-30 | 江苏理工学院 | Pollination robot mechanical arm obstacle avoidance path planning method and device |
CN112179367A (en) * | 2020-09-25 | 2021-01-05 | 广东海洋大学 | Intelligent autonomous navigation method based on deep reinforcement learning |
CN112631173A (en) * | 2020-12-11 | 2021-04-09 | 中国人民解放军国防科技大学 | Brain-controlled unmanned platform cooperative control system |
-
2021
- 2021-05-26 CN CN202110575846.8A patent/CN113359717B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109822579A (en) * | 2019-04-10 | 2019-05-31 | 江苏艾萨克机器人股份有限公司 | Cooperation robot security's control method of view-based access control model |
CN110244734A (en) * | 2019-06-20 | 2019-09-17 | 中山大学 | A kind of automatic driving vehicle paths planning method based on depth convolutional neural networks |
CN110125943A (en) * | 2019-06-27 | 2019-08-16 | 易思维(杭州)科技有限公司 | Multi-degree-of-freemechanical mechanical arm obstacle-avoiding route planning method |
CN110632931A (en) * | 2019-10-09 | 2019-12-31 | 哈尔滨工程大学 | Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment |
CN110703768A (en) * | 2019-11-08 | 2020-01-17 | 福州大学 | Improved dynamic RRT mobile robot motion planning method |
CN111844007A (en) * | 2020-06-02 | 2020-10-30 | 江苏理工学院 | Pollination robot mechanical arm obstacle avoidance path planning method and device |
CN111596668A (en) * | 2020-06-17 | 2020-08-28 | 苏州大学 | Mobile robot anthropomorphic path planning method based on reverse reinforcement learning |
CN112179367A (en) * | 2020-09-25 | 2021-01-05 | 广东海洋大学 | Intelligent autonomous navigation method based on deep reinforcement learning |
CN112631173A (en) * | 2020-12-11 | 2021-04-09 | 中国人民解放军国防科技大学 | Brain-controlled unmanned platform cooperative control system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114237235A (en) * | 2021-12-02 | 2022-03-25 | 之江实验室 | Mobile robot obstacle avoidance method based on deep reinforcement learning |
CN114237235B (en) * | 2021-12-02 | 2024-01-19 | 之江实验室 | Mobile robot obstacle avoidance method based on deep reinforcement learning |
CN114485673A (en) * | 2022-02-09 | 2022-05-13 | 山东大学 | Service robot crowd perception navigation method and system based on deep reinforcement learning |
CN114485673B (en) * | 2022-02-09 | 2023-11-03 | 山东大学 | Service robot crowd sensing navigation method and system based on deep reinforcement learning |
CN114384920A (en) * | 2022-03-23 | 2022-04-22 | 安徽大学 | Dynamic obstacle avoidance method based on real-time construction of local grid map |
US11720110B2 (en) | 2022-03-23 | 2023-08-08 | Anhui University | Dynamic obstacle avoidance method based on real-time local grid map construction |
Also Published As
Publication number | Publication date |
---|---|
CN113359717B (en) | 2022-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113359717B (en) | Mobile robot navigation obstacle avoidance method based on deep reinforcement learning | |
Cai | ROBOTICS: From Manipulator to Mobilebot | |
JP6854549B2 (en) | AUV action planning and motion control methods based on reinforcement learning | |
Cao et al. | Multi-AUV target search based on bioinspired neurodynamics model in 3-D underwater environments | |
Parker | Cooperative robotics for multi-target observation | |
Velagic et al. | A 3-level autonomous mobile robot navigation system designed by using reasoning/search approaches | |
Zalama et al. | Adaptive behavior navigation of a mobile robot | |
Low et al. | A hybrid mobile robot architecture with integrated planning and control | |
Botteghi et al. | On reward shaping for mobile robot navigation: A reinforcement learning and SLAM based approach | |
Modayil et al. | Autonomous development of a grounded object ontology by a learning robot | |
Kazem et al. | Modified vector field histogram with a neural network learning model for mobile robot path planning and obstacle avoidance. | |
Liu et al. | Episodic memory-based robotic planning under uncertainty | |
Tung et al. | Socially aware robot navigation using deep reinforcement learning | |
Malviya et al. | Autonomous social robot navigation using a behavioral finite state social machine | |
Sebastian et al. | Neural network based heterogeneous sensor fusion for robot motion planning | |
Azouaoui et al. | Soft‐computing based navigation approach for a bi‐steerable mobile robot | |
Gavrilov et al. | Mobile robot navigation using reinforcement learning based on neural network with short term memory | |
Pandey et al. | Trajectory Planning and Collision Control of a Mobile Robot: A Penalty-Based PSO Approach | |
Kondratenko et al. | Safe Navigation of an Autonomous Robot in Dynamic and Unknown Environments | |
De Villiers et al. | Learning fine-grained control for mapless navigation | |
de Almeida Afonso et al. | Autonomous robot navigation in crowd | |
Springer et al. | Simple strategies for collision-avoidance in robot soccer | |
Al Arafat et al. | Neural network-based obstacle and pothole avoiding robot | |
Beom et al. | Behavioral control in mobile robot navigation using fuzzy decision making approach | |
Song et al. | Robot Navigation in Crowd via DeepReinforcement Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |