CN112327821A - Intelligent cleaning robot path planning method based on deep reinforcement learning - Google Patents
Intelligent cleaning robot path planning method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN112327821A CN112327821A CN202010651117.1A CN202010651117A CN112327821A CN 112327821 A CN112327821 A CN 112327821A CN 202010651117 A CN202010651117 A CN 202010651117A CN 112327821 A CN112327821 A CN 112327821A
- Authority
- CN
- China
- Prior art keywords
- strategy
- cleaning robot
- path planning
- reinforcement learning
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000002787 reinforcement Effects 0.000 title claims abstract description 18
- 230000006399 behavior Effects 0.000 claims abstract description 26
- 238000013528 artificial neural network Methods 0.000 claims abstract description 24
- 230000006870 function Effects 0.000 claims abstract description 16
- 230000009471 action Effects 0.000 claims abstract description 8
- 238000011478 gradient descent method Methods 0.000 claims abstract description 4
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000005096 rolling process Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 claims description 3
- 238000010408 sweeping Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000005406 washing Methods 0.000 claims description 3
- 230000002596 correlated effect Effects 0.000 claims description 2
- 239000000428 dust Substances 0.000 claims description 2
- 239000004744 fabric Substances 0.000 claims 1
- 239000000284 extract Substances 0.000 abstract 1
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0238—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
- G05D1/024—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0242—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using non-visible light signals, e.g. IR or UV signals
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0246—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0255—Control of position or course in two dimensions specially adapted to land vehicles using acoustic signals, e.g. ultra-sonic singals
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0257—Control of position or course in two dimensions specially adapted to land vehicles using a radar
Abstract
The invention discloses an intelligent cleaning robot path planning method based on deep reinforcement learning, which can realize the functions of cleaning a place with a large amount of garbage by a cleaning robot in priority, self-adapting obstacle avoidance, returning to charge in time and the like. The method is a deep reinforcement learning DDPG algorithm, and behavior strategies are generated through a strategy neural network and comprise a cleaning behavior strategy and a motion behavior strategy. And adding the behavior strategy into the exploration noise, sending the behavior strategy into the intelligent cleaning robot for execution, fusing through a sensor system to obtain state information, and calculating a current return value through a designed return function. The algorithm stores the trained state, action, next state and return value into an experience cache pool, randomly extracts experience, and trains the neural network by a gradient descent method. The method is reasonable, has strong practicability and is mainly used for indoor navigation.
Description
Technical Field
The invention relates to the field of intelligent cleaning robots, in particular to an intelligent cleaning robot path planning method based on deep reinforcement learning.
Background
At present, with the development of the property management industry, the major backbone strength of most property service enterprises is employees over 50 years old, and young people are deficient. The research intelligent cleaning robot can effectively solve the problem of shortage of staff at the front line of the property industry, greatly promote the enterprise to rapidly output services outwards and increase the additional value of other services.
However, currently, indoor intelligent cleaning robot navigation is mainly based on an instant positioning And Mapping technology (SLAM), but the problems of incomplete cleaning of part of areas, low cleaning efficiency And the like are caused by the problem of path planning.
The Deep Deterministic Policy Gradient (DDPG) algorithm, as a classical algorithm in Deep reinforcement learning, has a great advantage in the aspect of continuous control.
The invention provides an intelligent cleaning robot path planning method based on deep reinforcement learning. The method is based on a DDPG algorithm, integrates various sensor information and realizes the dynamic planning of the path of the cleaning robot. The cleaning robot has the functions of preferentially cleaning places with much garbage, self-adapting obstacle avoidance, timely returning to charge and the like.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention aims to provide an intelligent cleaning robot path planning method based on deep reinforcement learning so as to improve the working efficiency of a cleaning robot.
The invention is realized by the following technical scheme:
an intelligent cleaning robot path planning method based on deep reinforcement learning is characterized by comprising the following steps:
s1, initializing a strategy neural network, judging a network, a target strategy network, a target judging network, network parameters, an experience cache pool and a cleaning robot;
s2, the cleaning robot senses the surrounding environment through the sensor, fuses sensor data, and judges whether obstacles exist around the cleaning robot or not and the self state of the cleaning robot according to the ground condition of the cleaning robot, the garbage distribution condition and the surrounding obstacle;
s3, the strategy neural network receives sensor data of the surrounding environment, and after the sensor data are input into the strategy neural network, the strategy neural network selects an execution behavior strategy through calculation;
s4, the cleaning robot executes the action strategy, converts the action strategy into an instruction which can be recognized by the driving mechanism, and inputs the instruction into the driving mechanism;
s5, after the upper computer sends the instruction, the lower computer receives the instruction and executes the corresponding action to complete the cleaning task and the path planning, and the lower computer completes the execution to obtain the reward rt and the next state st + 1;
s6, judging whether the cleaning robot reaches the garbage station and whether the action time is finished, if so, continuing to execute the step S1 to the step S6, otherwise, summarizing the experience of the step S1 to the step S6 and executing the step S7;
s7, storing the experience in an experience cache pool, and using the experience cache pool to make the states independent to each other to eliminate strong correlation existing between input experiences;
s8, randomly sampling N experiences from the experience cache pool, and calculating the loss function value of the strategy value algorithm and the loss function value of the strategy decision algorithm.
S9, calculating the expected return of the current strategy through the target strategy network and the evaluation network, and estimating the accumulated return of each state strategy pair.
S10, training the neural network by adopting a gradient descent method, updating the weight coefficient of the target value network by using a random gradient descent algorithm to minimize a loss function, and calculating a gradient updating target strategy network and parameters of the strategy neural network.
Wherein, the sensor in step S2 may be one or more of a gyroscope, a laser radar, a camera, an ultrasonic wave, and an infrared.
The behavior strategy in step S3 includes a cleaning behavior strategy and a motion behavior strategy, where the cleaning behavior strategy includes washing, mopping, sweeping, and sucking, and the motion behavior strategy includes forward, backward, left-turning, right-turning, and braking.
Wherein, the driving mechanism in the step S4 includes one of a motion motor, a rolling brush motor, an edge brush motor, a rolling brush motor, a disc brush motor, a mop driving motor, and a dust collection motor.
In step S5, the size of the prize rt in step S is positively correlated with the number of collected garbage, the cleaning range, the obstacle avoidance, the power, and other factors.
In step S8, the loss function evaluation index is a mean square error.
In step S10, an Adam optimizer is used for the random gradient descent.
The invention has the beneficial effects that:
improve cleaning machines work efficiency: according to the intelligent cleaning robot path planning method based on deep reinforcement learning, modules such as a strategy neural network, a judgment network, a target strategy network, a target judgment network, network parameters and an experience cache pool are arranged, and calling and application of each module are realized by the method, so that positive feedback work of the cleaning robot is realized, and the working efficiency of the cleaning robot is improved.
Drawings
The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, other drawings can be obtained on the basis of the following drawings without inventive effort.
FIG. 1 is a block flow diagram of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
It should be noted that the structures shown in the drawings are only used for matching the disclosure of the present invention, so as to be understood and read by those skilled in the art, and are not used to limit the conditions of the present invention, so that the present invention has no technical significance, and any modifications or adjustments of the structures should still fall within the scope of the technical contents of the present invention without affecting the function and the achievable purpose of the present invention.
As shown in fig. 1, a schematic flow chart of an intelligent cleaning robot path planning method based on deep reinforcement learning according to an embodiment of the present invention includes:
step S1: initializing a strategy neural network, a judgment network, a target strategy neural network, a target judgment network and network parameters, initializing an experience cache pool, and initializing a cleaning robot;
step S2: the cleaning robot senses the surrounding environment through a sensor, integrates sensor data, constructs a map, and identifies the ground environment and the garbage condition based on a visual technology, wherein the sensor comprises a gyroscope, a laser radar, a camera, ultrasonic waves, infrared rays and the like, and the specific condition is sensor equipment configured according to the actual requirement of the cleaning robot;
step S3: the method comprises the following steps that a strategy neural network receives surrounding environment state data, after sensor data are input into the strategy neural network, the strategy neural network selects an execution strategy through calculation, a behavior strategy is a random process generated according to a current strategy and random noise, a value of the behavior strategy is obtained through sampling of the random process, the behavior strategy plans a place with a large amount of garbage to be cleaned preferentially, in the embodiment, a cleaning behavior strategy comprises behaviors of washing, dragging, sweeping and sucking, and a motion behavior strategy comprises behaviors of advancing, retreating, turning left, turning right and braking;
step S4: the cleaning robot executes a behavior strategy, converts the behavior strategy into an instruction which can be recognized by the motor, inputs the instruction into the motor, and further controls the rotating speed, the rotating direction, the rotating time and the like of the motor;
step S5: after the upper computer sends an instruction, the lower computer receives and executes corresponding actions to complete a cleaning task and path planning, judges whether garbage exists in the current indoor environment through the visual sensor, and obtains a reward rt and a next state st +1 after the execution is completed;
step S6: it is determined whether the cleaning robot reaches the garbage station and the operation time is over, and if the operation is over, the process proceeds to step S1. Otherwise, go to step S7;
step S7: storing experiences of executing actions, rewarding and the like into an experience cache pool, and using the experience cache pool to enable states to be mutually independent so as to eliminate strong correlation existing between input experiences;
step S8: and randomly sampling N experiences from an experience cache pool, and calculating a loss function value of the strategy value algorithm and a loss function value of the strategy decision algorithm, wherein preferably, the loss function evaluation index adopts a mean square error.
Step S9: and calculating the expected return of the current strategy through the target judgment neural network, and estimating the accumulated return of each state strategy pair.
Step S10: and training the neural network by adopting a gradient descent method. And updating the weight coefficients of the target value network by using a random gradient descent algorithm to minimize a loss function, and calculating parameters of a gradient updating target value neural network and a strategy neural network, wherein the Adam optimizer is preferably adopted for the medium random gradient descent.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the protection scope of the present invention, although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (7)
1. An intelligent cleaning robot path planning method based on deep reinforcement learning is characterized by comprising the following steps:
s1, initializing a strategy neural network, judging a network, a target strategy network, a target judging network, network parameters, an experience cache pool and a cleaning robot;
s2, the cleaning robot senses the surrounding environment through the sensor, fuses sensor data, and judges whether obstacles exist around the cleaning robot or not and the self state of the cleaning robot according to the ground condition of the cleaning robot, the garbage distribution condition and the surrounding obstacle;
s3, the strategy neural network receives sensor data of the surrounding environment, and after the sensor data are input into the strategy neural network, the strategy neural network selects an execution behavior strategy through calculation;
s4, the cleaning robot executes the action strategy, converts the action strategy into an instruction which can be recognized by the driving mechanism, and inputs the instruction into the driving mechanism;
s5, after the upper computer sends the instruction, the lower computer receives the instruction and executes the corresponding action to complete the cleaning task and the path planning, and the lower computer completes the execution to obtain the reward rt and the next state st + 1;
s6, judging whether the cleaning robot reaches the garbage station and whether the action time is finished, if so, continuing to execute the step S1 to the step S6, otherwise, summarizing the experience of the step S1 to the step S6 and executing the step S7;
s7, storing the experience in an experience cache pool, and using the experience cache pool to make the states independent to each other to eliminate strong correlation existing between input experiences;
s8, randomly sampling N experiences from the experience cache pool, and calculating the loss function value of the strategy value algorithm and the loss function value of the strategy decision algorithm.
S9, calculating the expected return of the current strategy through the target strategy network and the evaluation network, and estimating the accumulated return of each state strategy pair.
S10, training the neural network by adopting a gradient descent method, updating the weight coefficient of the target value network by using a random gradient descent algorithm to minimize a loss function, and calculating a gradient updating target strategy network and parameters of the strategy neural network.
2. The intelligent cleaning robot path planning method based on deep reinforcement learning of claim 1, wherein the sensor in step S2 can be one or more of a gyroscope, a laser radar, a camera, an ultrasonic wave, and an infrared.
3. The intelligent cleaning robot path planning method based on deep reinforcement learning of claim 1, wherein the behavior strategies in step S3 include cleaning behavior strategies including washing, mopping, sweeping and sucking behaviors and motion behavior strategies including forward, backward, left-turning, right-turning and braking behaviors.
4. The intelligent cleaning robot path planning method based on deep reinforcement learning of claim 1, wherein the driving mechanism in step S4 comprises one of a motion motor, a rolling brush motor, an edge brush motor, a rolling brush motor, a disc brush motor, a mop cloth driving motor, and a dust collection motor.
5. The intelligent cleaning robot path planning method based on the deep reinforcement learning of claim 1, wherein the size of the reward rt awarded in the step S5 is positively correlated with the factors of the quantity of collected garbage, the cleaning range, the obstacle avoidance, the electric quantity and the like.
6. The intelligent cleaning robot path planning method based on deep reinforcement learning of claim 1, wherein the loss function evaluation index in step S8 is mean square error.
7. The intelligent cleaning robot path planning method based on the deep reinforcement learning of claim 1, wherein an Adam optimizer is adopted in the step S10 of the random gradient descent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010651117.1A CN112327821A (en) | 2020-07-08 | 2020-07-08 | Intelligent cleaning robot path planning method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010651117.1A CN112327821A (en) | 2020-07-08 | 2020-07-08 | Intelligent cleaning robot path planning method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112327821A true CN112327821A (en) | 2021-02-05 |
Family
ID=74303637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010651117.1A Pending CN112327821A (en) | 2020-07-08 | 2020-07-08 | Intelligent cleaning robot path planning method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112327821A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112859885A (en) * | 2021-04-25 | 2021-05-28 | 四川远方云天食品科技有限公司 | Cooperative optimization method for path of feeding robot |
CN113534821A (en) * | 2021-09-14 | 2021-10-22 | 深圳市元鼎智能创新有限公司 | Multi-sensor fusion sweeping robot movement obstacle avoidance method and device and robot |
CN114587190A (en) * | 2021-08-23 | 2022-06-07 | 北京石头世纪科技股份有限公司 | Control method, system and device of cleaning device and computer readable storage medium |
CN115545350A (en) * | 2022-11-28 | 2022-12-30 | 湖南工商大学 | Comprehensive deep neural network and reinforcement learning vehicle path problem solving method |
CN116611635A (en) * | 2023-04-23 | 2023-08-18 | 暨南大学 | Sanitation robot car scheduling method and system based on car-road cooperation and reinforcement learning |
CN117666593A (en) * | 2024-02-01 | 2024-03-08 | 厦门蓝旭科技有限公司 | Walking control optimization method for photovoltaic cleaning robot |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
CN107168303A (en) * | 2017-03-16 | 2017-09-15 | 中国科学院深圳先进技术研究院 | A kind of automatic Pilot method and device of automobile |
CN108415254A (en) * | 2018-03-12 | 2018-08-17 | 苏州大学 | Waste recovery robot control method based on depth Q networks and its device |
CN108523768A (en) * | 2018-03-12 | 2018-09-14 | 苏州大学 | Household cleaning machine people's control system based on adaptive strategy optimization |
US20180341378A1 (en) * | 2015-11-25 | 2018-11-29 | Supered Pty Ltd. | Computer-implemented frameworks and methodologies configured to enable delivery of content and/or user interface functionality based on monitoring of activity in a user interface environment and/or control access to services delivered in an online environment responsive to operation of a risk assessment protocol |
CN109726866A (en) * | 2018-12-27 | 2019-05-07 | 浙江农林大学 | Unmanned boat paths planning method based on Q learning neural network |
CN109783412A (en) * | 2019-01-18 | 2019-05-21 | 电子科技大学 | A kind of method that deeply study accelerates training |
CN109906132A (en) * | 2016-09-15 | 2019-06-18 | 谷歌有限责任公司 | The deeply of Robotic Manipulator learns |
CN109976340A (en) * | 2019-03-19 | 2019-07-05 | 中国人民解放军国防科技大学 | Man-machine cooperation dynamic obstacle avoidance method and system based on deep reinforcement learning |
CN110370295A (en) * | 2019-07-02 | 2019-10-25 | 浙江大学 | Soccer robot active control suction ball method based on deeply study |
CN110597058A (en) * | 2019-08-28 | 2019-12-20 | 浙江工业大学 | Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning |
CN111061277A (en) * | 2019-12-31 | 2020-04-24 | 歌尔股份有限公司 | Unmanned vehicle global path planning method and device |
CN111260027A (en) * | 2020-01-10 | 2020-06-09 | 电子科技大学 | Intelligent agent automatic decision-making method based on reinforcement learning |
-
2020
- 2020-07-08 CN CN202010651117.1A patent/CN112327821A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180341378A1 (en) * | 2015-11-25 | 2018-11-29 | Supered Pty Ltd. | Computer-implemented frameworks and methodologies configured to enable delivery of content and/or user interface functionality based on monitoring of activity in a user interface environment and/or control access to services delivered in an online environment responsive to operation of a risk assessment protocol |
CN109906132A (en) * | 2016-09-15 | 2019-06-18 | 谷歌有限责任公司 | The deeply of Robotic Manipulator learns |
CN107168303A (en) * | 2017-03-16 | 2017-09-15 | 中国科学院深圳先进技术研究院 | A kind of automatic Pilot method and device of automobile |
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
CN108415254A (en) * | 2018-03-12 | 2018-08-17 | 苏州大学 | Waste recovery robot control method based on depth Q networks and its device |
CN108523768A (en) * | 2018-03-12 | 2018-09-14 | 苏州大学 | Household cleaning machine people's control system based on adaptive strategy optimization |
CN109726866A (en) * | 2018-12-27 | 2019-05-07 | 浙江农林大学 | Unmanned boat paths planning method based on Q learning neural network |
CN109783412A (en) * | 2019-01-18 | 2019-05-21 | 电子科技大学 | A kind of method that deeply study accelerates training |
CN109976340A (en) * | 2019-03-19 | 2019-07-05 | 中国人民解放军国防科技大学 | Man-machine cooperation dynamic obstacle avoidance method and system based on deep reinforcement learning |
CN110370295A (en) * | 2019-07-02 | 2019-10-25 | 浙江大学 | Soccer robot active control suction ball method based on deeply study |
CN110597058A (en) * | 2019-08-28 | 2019-12-20 | 浙江工业大学 | Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning |
CN111061277A (en) * | 2019-12-31 | 2020-04-24 | 歌尔股份有限公司 | Unmanned vehicle global path planning method and device |
CN111260027A (en) * | 2020-01-10 | 2020-06-09 | 电子科技大学 | Intelligent agent automatic decision-making method based on reinforcement learning |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112859885A (en) * | 2021-04-25 | 2021-05-28 | 四川远方云天食品科技有限公司 | Cooperative optimization method for path of feeding robot |
CN114587190A (en) * | 2021-08-23 | 2022-06-07 | 北京石头世纪科技股份有限公司 | Control method, system and device of cleaning device and computer readable storage medium |
CN113534821A (en) * | 2021-09-14 | 2021-10-22 | 深圳市元鼎智能创新有限公司 | Multi-sensor fusion sweeping robot movement obstacle avoidance method and device and robot |
CN115545350A (en) * | 2022-11-28 | 2022-12-30 | 湖南工商大学 | Comprehensive deep neural network and reinforcement learning vehicle path problem solving method |
CN115545350B (en) * | 2022-11-28 | 2024-01-16 | 湖南工商大学 | Vehicle path problem solving method integrating deep neural network and reinforcement learning |
CN116611635A (en) * | 2023-04-23 | 2023-08-18 | 暨南大学 | Sanitation robot car scheduling method and system based on car-road cooperation and reinforcement learning |
CN116611635B (en) * | 2023-04-23 | 2024-01-30 | 暨南大学 | Sanitation robot car scheduling method and system based on car-road cooperation and reinforcement learning |
CN117666593A (en) * | 2024-02-01 | 2024-03-08 | 厦门蓝旭科技有限公司 | Walking control optimization method for photovoltaic cleaning robot |
CN117666593B (en) * | 2024-02-01 | 2024-04-09 | 厦门蓝旭科技有限公司 | Walking control optimization method for photovoltaic cleaning robot |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112327821A (en) | Intelligent cleaning robot path planning method based on deep reinforcement learning | |
WO2021208225A1 (en) | Obstacle avoidance method, apparatus, and device for epidemic-prevention disinfecting and cleaning robot | |
WO2021208380A1 (en) | Disinfection and cleaning operation effect testing method and device for epidemic prevention disinfection and cleaning robot | |
Yang et al. | A neural network approach to complete coverage path planning | |
CN104399682B (en) | A kind of photovoltaic power station component cleans intelligent decision early warning system | |
CN108733061B (en) | Path correction method for cleaning operation | |
Wang et al. | Modeling motion patterns of dynamic objects by IOHMM | |
CN105139072A (en) | Reinforcement learning algorithm applied to non-tracking intelligent trolley barrier-avoiding system | |
US11776409B2 (en) | Methods, internet of things systems and storage mediums for street management in smart cities | |
Jung et al. | Experiments in realising cooperation between autonomous mobile robots | |
AU2022204569B2 (en) | Method for multi-agent dynamic path planning | |
EP2390740A2 (en) | Autonomous machine selective consultation | |
CN108716201B (en) | Collaborative sweeping method | |
CN111562784B (en) | Disinfection method and equipment for mobile disinfection robot | |
CN109674404B (en) | Obstacle avoidance processing mode of sweeping robot based on free move technology | |
CN111608124A (en) | Autonomous navigation method for cleaning robot in high-speed service area | |
CN113566808A (en) | Navigation path planning method, device, equipment and readable storage medium | |
CN112572466A (en) | Control method of self-adaptive unmanned sweeper | |
CN114077807A (en) | Computer implementation method and equipment for controlling mobile robot based on semantic environment diagram | |
CN114527762A (en) | Automatic planning method for cleaning of photovoltaic cell panel | |
CN108762275B (en) | Collaborative sweeping method | |
CN112168074B (en) | Cleaning method and system of intelligent cleaning robot | |
CN117109574A (en) | Agricultural transportation machinery coverage path planning method | |
CN108392153A (en) | A kind of sweeping robot intelligence control system | |
CN107627314A (en) | A kind of pathfinding robot, Pathfinding system and method for searching based on genetic algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |