CN112327821A

CN112327821A - Intelligent cleaning robot path planning method based on deep reinforcement learning

Info

Publication number: CN112327821A
Application number: CN202010651117.1A
Authority: CN
Inventors: 杜林�
Original assignee: Dongguan Junyi Vision Technology Co Ltd
Current assignee: Dongguan Junyi Vision Technology Co Ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2021-02-05

Abstract

The invention discloses an intelligent cleaning robot path planning method based on deep reinforcement learning, which can realize the functions of cleaning a place with a large amount of garbage by a cleaning robot in priority, self-adapting obstacle avoidance, returning to charge in time and the like. The method is a deep reinforcement learning DDPG algorithm, and behavior strategies are generated through a strategy neural network and comprise a cleaning behavior strategy and a motion behavior strategy. And adding the behavior strategy into the exploration noise, sending the behavior strategy into the intelligent cleaning robot for execution, fusing through a sensor system to obtain state information, and calculating a current return value through a designed return function. The algorithm stores the trained state, action, next state and return value into an experience cache pool, randomly extracts experience, and trains the neural network by a gradient descent method. The method is reasonable, has strong practicability and is mainly used for indoor navigation.

Description

Intelligent cleaning robot path planning method based on deep reinforcement learning

Technical Field

The invention relates to the field of intelligent cleaning robots, in particular to an intelligent cleaning robot path planning method based on deep reinforcement learning.

Background

At present, with the development of the property management industry, the major backbone strength of most property service enterprises is employees over 50 years old, and young people are deficient. The research intelligent cleaning robot can effectively solve the problem of shortage of staff at the front line of the property industry, greatly promote the enterprise to rapidly output services outwards and increase the additional value of other services.

However, currently, indoor intelligent cleaning robot navigation is mainly based on an instant positioning And Mapping technology (SLAM), but the problems of incomplete cleaning of part of areas, low cleaning efficiency And the like are caused by the problem of path planning.

The Deep Deterministic Policy Gradient (DDPG) algorithm, as a classical algorithm in Deep reinforcement learning, has a great advantage in the aspect of continuous control.

The invention provides an intelligent cleaning robot path planning method based on deep reinforcement learning. The method is based on a DDPG algorithm, integrates various sensor information and realizes the dynamic planning of the path of the cleaning robot. The cleaning robot has the functions of preferentially cleaning places with much garbage, self-adapting obstacle avoidance, timely returning to charge and the like.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention aims to provide an intelligent cleaning robot path planning method based on deep reinforcement learning so as to improve the working efficiency of a cleaning robot.

The invention is realized by the following technical scheme:

an intelligent cleaning robot path planning method based on deep reinforcement learning is characterized by comprising the following steps:

s1, initializing a strategy neural network, judging a network, a target strategy network, a target judging network, network parameters, an experience cache pool and a cleaning robot;

s2, the cleaning robot senses the surrounding environment through the sensor, fuses sensor data, and judges whether obstacles exist around the cleaning robot or not and the self state of the cleaning robot according to the ground condition of the cleaning robot, the garbage distribution condition and the surrounding obstacle;

s3, the strategy neural network receives sensor data of the surrounding environment, and after the sensor data are input into the strategy neural network, the strategy neural network selects an execution behavior strategy through calculation;

s4, the cleaning robot executes the action strategy, converts the action strategy into an instruction which can be recognized by the driving mechanism, and inputs the instruction into the driving mechanism;

s5, after the upper computer sends the instruction, the lower computer receives the instruction and executes the corresponding action to complete the cleaning task and the path planning, and the lower computer completes the execution to obtain the reward rt and the next state st + 1;

s6, judging whether the cleaning robot reaches the garbage station and whether the action time is finished, if so, continuing to execute the step S1 to the step S6, otherwise, summarizing the experience of the step S1 to the step S6 and executing the step S7;

s7, storing the experience in an experience cache pool, and using the experience cache pool to make the states independent to each other to eliminate strong correlation existing between input experiences;

s8, randomly sampling N experiences from the experience cache pool, and calculating the loss function value of the strategy value algorithm and the loss function value of the strategy decision algorithm.

S9, calculating the expected return of the current strategy through the target strategy network and the evaluation network, and estimating the accumulated return of each state strategy pair.

S10, training the neural network by adopting a gradient descent method, updating the weight coefficient of the target value network by using a random gradient descent algorithm to minimize a loss function, and calculating a gradient updating target strategy network and parameters of the strategy neural network.

Wherein, the sensor in step S2 may be one or more of a gyroscope, a laser radar, a camera, an ultrasonic wave, and an infrared.

The behavior strategy in step S3 includes a cleaning behavior strategy and a motion behavior strategy, where the cleaning behavior strategy includes washing, mopping, sweeping, and sucking, and the motion behavior strategy includes forward, backward, left-turning, right-turning, and braking.

Wherein, the driving mechanism in the step S4 includes one of a motion motor, a rolling brush motor, an edge brush motor, a rolling brush motor, a disc brush motor, a mop driving motor, and a dust collection motor.

In step S5, the size of the prize rt in step S is positively correlated with the number of collected garbage, the cleaning range, the obstacle avoidance, the power, and other factors.

In step S8, the loss function evaluation index is a mean square error.

In step S10, an Adam optimizer is used for the random gradient descent.

The invention has the beneficial effects that:

improve cleaning machines work efficiency: according to the intelligent cleaning robot path planning method based on deep reinforcement learning, modules such as a strategy neural network, a judgment network, a target strategy network, a target judgment network, network parameters and an experience cache pool are arranged, and calling and application of each module are realized by the method, so that positive feedback work of the cleaning robot is realized, and the working efficiency of the cleaning robot is improved.

Drawings

The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, other drawings can be obtained on the basis of the following drawings without inventive effort.

FIG. 1 is a block flow diagram of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

It should be noted that the structures shown in the drawings are only used for matching the disclosure of the present invention, so as to be understood and read by those skilled in the art, and are not used to limit the conditions of the present invention, so that the present invention has no technical significance, and any modifications or adjustments of the structures should still fall within the scope of the technical contents of the present invention without affecting the function and the achievable purpose of the present invention.

As shown in fig. 1, a schematic flow chart of an intelligent cleaning robot path planning method based on deep reinforcement learning according to an embodiment of the present invention includes:

step S1: initializing a strategy neural network, a judgment network, a target strategy neural network, a target judgment network and network parameters, initializing an experience cache pool, and initializing a cleaning robot;

step S2: the cleaning robot senses the surrounding environment through a sensor, integrates sensor data, constructs a map, and identifies the ground environment and the garbage condition based on a visual technology, wherein the sensor comprises a gyroscope, a laser radar, a camera, ultrasonic waves, infrared rays and the like, and the specific condition is sensor equipment configured according to the actual requirement of the cleaning robot;

step S3: the method comprises the following steps that a strategy neural network receives surrounding environment state data, after sensor data are input into the strategy neural network, the strategy neural network selects an execution strategy through calculation, a behavior strategy is a random process generated according to a current strategy and random noise, a value of the behavior strategy is obtained through sampling of the random process, the behavior strategy plans a place with a large amount of garbage to be cleaned preferentially, in the embodiment, a cleaning behavior strategy comprises behaviors of washing, dragging, sweeping and sucking, and a motion behavior strategy comprises behaviors of advancing, retreating, turning left, turning right and braking;

step S4: the cleaning robot executes a behavior strategy, converts the behavior strategy into an instruction which can be recognized by the motor, inputs the instruction into the motor, and further controls the rotating speed, the rotating direction, the rotating time and the like of the motor;

step S5: after the upper computer sends an instruction, the lower computer receives and executes corresponding actions to complete a cleaning task and path planning, judges whether garbage exists in the current indoor environment through the visual sensor, and obtains a reward rt and a next state st +1 after the execution is completed;

step S6: it is determined whether the cleaning robot reaches the garbage station and the operation time is over, and if the operation is over, the process proceeds to step S1. Otherwise, go to step S7;

step S7: storing experiences of executing actions, rewarding and the like into an experience cache pool, and using the experience cache pool to enable states to be mutually independent so as to eliminate strong correlation existing between input experiences;

step S8: and randomly sampling N experiences from an experience cache pool, and calculating a loss function value of the strategy value algorithm and a loss function value of the strategy decision algorithm, wherein preferably, the loss function evaluation index adopts a mean square error.

Step S9: and calculating the expected return of the current strategy through the target judgment neural network, and estimating the accumulated return of each state strategy pair.

Step S10: and training the neural network by adopting a gradient descent method. And updating the weight coefficients of the target value network by using a random gradient descent algorithm to minimize a loss function, and calculating parameters of a gradient updating target value neural network and a strategy neural network, wherein the Adam optimizer is preferably adopted for the medium random gradient descent.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the protection scope of the present invention, although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. An intelligent cleaning robot path planning method based on deep reinforcement learning is characterized by comprising the following steps:

2. The intelligent cleaning robot path planning method based on deep reinforcement learning of claim 1, wherein the sensor in step S2 can be one or more of a gyroscope, a laser radar, a camera, an ultrasonic wave, and an infrared.

3. The intelligent cleaning robot path planning method based on deep reinforcement learning of claim 1, wherein the behavior strategies in step S3 include cleaning behavior strategies including washing, mopping, sweeping and sucking behaviors and motion behavior strategies including forward, backward, left-turning, right-turning and braking behaviors.

4. The intelligent cleaning robot path planning method based on deep reinforcement learning of claim 1, wherein the driving mechanism in step S4 comprises one of a motion motor, a rolling brush motor, an edge brush motor, a rolling brush motor, a disc brush motor, a mop cloth driving motor, and a dust collection motor.

5. The intelligent cleaning robot path planning method based on the deep reinforcement learning of claim 1, wherein the size of the reward rt awarded in the step S5 is positively correlated with the factors of the quantity of collected garbage, the cleaning range, the obstacle avoidance, the electric quantity and the like.

6. The intelligent cleaning robot path planning method based on deep reinforcement learning of claim 1, wherein the loss function evaluation index in step S8 is mean square error.

7. The intelligent cleaning robot path planning method based on the deep reinforcement learning of claim 1, wherein an Adam optimizer is adopted in the step S10 of the random gradient descent.