CN115421494A - Cleaning robot path planning method, system, computer device and storage medium - Google Patents

Cleaning robot path planning method, system, computer device and storage medium Download PDF

Info

Publication number
CN115421494A
CN115421494A CN202211147813.4A CN202211147813A CN115421494A CN 115421494 A CN115421494 A CN 115421494A CN 202211147813 A CN202211147813 A CN 202211147813A CN 115421494 A CN115421494 A CN 115421494A
Authority
CN
China
Prior art keywords
cleaning robot
path planning
robot
node
cleaning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211147813.4A
Other languages
Chinese (zh)
Inventor
王羽钧
洪晓鹏
沈超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202211147813.4A priority Critical patent/CN115421494A/en
Publication of CN115421494A publication Critical patent/CN115421494A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)

Abstract

The invention belongs to the field of artificial intelligence and robot path planning, and discloses a cleaning robot path planning method, a cleaning robot path planning system, computer equipment and a storage medium, wherein the cleaning robot path planning system comprises the following components: acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload; and calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, and obtaining a path planning result of each cleaning robot. The path planning method can realize path planning of multiple cleaning robots, solve the path planning problem of the cleaning robots with multiple robots and a large number of points to be cleaned, fit practical application scenes, obtain a path planning scheme superior to that of a traditional optimization method, and solve the path planning problem by far shorter operation time than traditional methods such as an ant colony algorithm and a dynamic planning algorithm.

Description

Cleaning robot path planning method, system, computer equipment and storage medium
Technical Field
The invention belongs to the field of artificial intelligence and robot path planning, and relates to a cleaning robot path planning method, a cleaning robot path planning system, computer equipment and a storage medium.
Background
The vigorous development of artificial intelligence and robot technology provides a prerequisite for the large-scale application of the cleaning robot, the continuous rise of labor cost and actual landing market space for the cleaning robot. Nowadays, the figure of the cleaning robot is seen in a small space such as a home house from a large public place such as an airport, a hospital, and a school. Obviously, it has become a trend of the times to complete cleaning work by a robot instead of a human.
The path planning must be performed before the robot starts to perform the cleaning task. The quality of the path planning directly affects the efficiency of completing the cleaning task and indirectly affects the energy consumption and wear rate of each robot. Existing path planning methods fall into two categories: the first type is a full-coverage path planning method represented by a cattle farming method, which enables a robot to traverse all cleaning areas according to some preset rules, and is simple to implement, but low in efficiency under the conditions of large cleaning space and sparse garbage distribution. The second type is a path planning method based on a traditional optimization technology represented by an ant colony algorithm, dynamic planning, gurobi and the like, the solving time of the method generally has an exponential relation with the number of path nodes and robots, and the method is not suitable for solving a large-scale multi-machine path planning problem.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned disadvantage of the prior art that multi-robot path planning of a cleaning robot is difficult, and provides a method, a system, a computer device and a storage medium for path planning of a cleaning robot.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
in a first aspect of the present invention, a cleaning robot path planning method includes:
acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload;
and calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, and obtaining a path planning result of each cleaning robot.
Optionally, the deep reinforcement learning model for path planning of the cleaning robot is constructed in the following manner:
establishing a mathematical model of a path planning problem of the cleaning robot;
establishing a Markov decision process model of the cleaning robot path planning problem according to a mathematical model of the cleaning robot path planning problem;
establishing an initial deep reinforcement learning model for path planning of the cleaning robot according to a Markov decision process model of the path planning problem of the cleaning robot;
and training an initial deep reinforcement learning model for path planning of the cleaning robot through a preset training set to obtain the deep reinforcement learning model for path planning of the cleaning robot.
Optionally, the mathematical model of the cleaning robot path planning problem includes optimization variables, optimization objectives, and constraint conditions;
wherein the optimization variables comprise a first optimization variable Y and a second optimization variable Z:
Figure BDA0003853000520000021
Z={z i,j |i∈P,j∈P}
wherein P is a node set formed by the robot library and the points to be cleaned
Figure BDA0003853000520000022
n is the number of points to be cleaned, p 0 Representing a robot library node; r is a set of cleaning robots
Figure BDA0003853000520000023
k is the number of the cleaning robots,
Figure BDA0003853000520000024
to indicate the variables, indicate whether the cleaning robot r is from p i Go out and arrive at p j If the robot r is from p i Go out and arrive at p j Then, then
Figure BDA0003853000520000031
Otherwise
Figure BDA0003853000520000032
z i,j Is p i From x i To p j Coordinate x of (2) j Total amount of waste of (a);
the optimization objective is shown as follows:
Figure BDA0003853000520000033
wherein, c j Is the point p to be cleaned j Cleaning workload of c 0 =0;v r Is the running speed of the cleaning robot r;
the constraint conditions comprise optimization variable value range constraint, region access frequency constraint, robot path continuity constraint, total garbage amount constraint and garbage transportation constraint which can be carried by the robot;
the value range constraint of the optimization variable is shown as the following formula:
Figure BDA0003853000520000034
z i,j ≥0,i∈P,j∈P
the region access times constraint is as follows:
Figure BDA0003853000520000035
the robot path continuity constraint is given by:
Figure BDA0003853000520000036
the total amount of garbage that the robot can carry is constrained as follows:
Figure BDA0003853000520000037
wherein, b r Is the garbage bin capacity of the cleaning robot r;
the refuse transport constraint is as follows:
Figure BDA0003853000520000038
Figure BDA0003853000520000041
wherein, P' = P- { P 0 P' is a set of n points to be cleaned, g j Is the point p to be cleaned j Amount of garbage of g 0 =0, m is a preset constant.
Optionally, the markov decision process model of the cleaning robot path planning problem includes an environmental state, an action, a state transition rule, and a cost;
wherein the environmental state S t As shown in the following formula:
S t =(D t ,E t ),
Figure BDA0003853000520000042
wherein, t is the number of steps,
Figure BDA0003853000520000043
to clean the remaining capacity of the robot r waste bin at step t,
Figure BDA0003853000520000044
in order to clean the node where the robot r is located at the t-th step,
Figure BDA0003853000520000045
a set formed by nodes visited by the cleaning robot r up to the t step;
Figure BDA0003853000520000046
to node p at the t-th step i If node p has access to state i Has been accessed, then
Figure BDA0003853000520000047
Otherwise
Figure BDA0003853000520000048
Action A t As shown in the following formula:
A t =(d t ,p t )
wherein, d t For node decoders activated at step t, p t E, P is the node selected in the t step;
state transition rules ST for actions according to t The environmental state is changed from S by the following formula t Transfer to S t+1
Figure BDA0003853000520000049
Figure BDA00038530005200000410
Figure BDA00038530005200000411
Figure BDA00038530005200000412
Wherein r is t Is node decoder d t A corresponding cleaning robot is arranged on the cleaning machine,
Figure BDA00038530005200000413
represents p is t Is spliced at
Figure BDA00038530005200000414
A terminal end;
the cost F is shown as follows:
Figure BDA0003853000520000051
wherein T is the total number of steps,
Figure BDA0003853000520000052
is the cost of the cleaning robot r at step t,
Figure BDA0003853000520000053
obtained by the following formula:
Figure BDA0003853000520000054
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000055
to represent
Figure BDA0003853000520000056
And
Figure BDA0003853000520000057
the distance of (a) to (b),
Figure BDA0003853000520000058
is p t The coordinates of (a) are calculated,
Figure BDA0003853000520000059
is composed of
Figure BDA00038530005200000510
The coordinates of (c).
Optionally, the deep reinforcement learning model for cleaning robot path planning includes: an encoder and a decoder; the encoder comprises a node encoder and a robot encoder, and the decoder comprises a decoder selector and k node decoders; the output ends of the node encoder and the robot encoder are connected with the input end of a decoder selector, and the output end of the decoder selector is connected with the input ends of the k node decoders;
the node encoder comprises a linear mapping layer and L1 graph encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l node Is the index of the graph coding module of the node encoder, when 1 is less than or equal to l node If < L1, the graph coding module L node Output terminal of (1) and node the input ends of +1 image coding modules are connected, when l node If = L1, the pattern coding module L node The output end of the decoder is connected with the input end of the decoder selector; the robot encoder comprises a linear mapping layer and L2 image encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l robot For the index of the image coding module of the robot encoder, when 1 is less than or equal to l robot If < L2, the graph coding module L robot Output terminal of (1) and robot the input ends of +1 image coding modules are connected, when l robot = L2, graph coding module L robot The output end of the decoder is connected with the input end of the decoder selector; the decoder selector comprises a multi-head attention layer and a fitness layer, and the output end of the multi-head attention layer is connected with the input end of the fitness layer; the node decoder comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer.
Optionally, the linear mapping layer is shown as follows:
Linear(x)=Wx+B
wherein the content of the first and second substances,
Figure BDA0003853000520000061
is the input of the data to be transmitted,
Figure BDA0003853000520000062
and
Figure BDA0003853000520000063
is a learnable parameter, d in Is the dimension of the data input, d out Is the output dimension of the linear mapping layer;
the fitness layer is represented by the following formula:
Figure BDA0003853000520000064
wherein softmax () is a normalized exponential function;
the multiheaded attention layer is represented by the formula:
MHA(X)=Concat(head 1 ,head 2 ,…,head h )W O
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000065
is the input of the multi-head attention layer, nxd x Is the dimension of the input data, concat is the matrix splicing operation,
Figure BDA0003853000520000066
is a trainable parameter, h is the number of heads of attention, d v Is the dimension of a vector of values, head i Is the output of the ith attention head; head i The calculation method of (a) is as follows:
Figure BDA0003853000520000067
wherein Q is i =XWi i Q ,
Figure BDA00038530005200000615
V i =XW i V
Figure BDA0003853000520000068
And
Figure BDA0003853000520000069
Figure BDA00038530005200000610
is a learnable parameter, d k Is the dimension of the key vector;
the graph encoding module is represented by the following formula:
X l+1 =GraphEncoder(X l )
wherein the content of the first and second substances,
Figure BDA00038530005200000611
is the input to the graph coding module and,
Figure BDA00038530005200000612
is the output of the graph coding module and,
Figure BDA00038530005200000613
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00038530005200000614
the method comprises the steps that a graph coding module calculates a process vector, and FF is a forward propagation module and is formed by connecting a plurality of linear mapping layers and a ReLU function layer; BN () is a batch normalization layer;
the ReLU function layer is expressed as follows:
ReLU(x)=max(0,x)
the batch normalization layer is shown below:
Figure BDA0003853000520000071
where γ and β are learnable parameters, E [ x ] is the expectation of x, var [ x ] is the variance of x, and E is a constant to prevent the denominator from being zero;
the input of the node encoder is I P ={(x i ,c i ,g i ) I belongs to P, and the output is
Figure BDA0003853000520000072
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000073
is the code of the ith node;
the input of the robot encoder is I R ={(v r ,b r ) L R belongs to R, and the output is
Figure BDA0003853000520000074
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000075
is the code of the ith cleaning robot;
the decoder selector inputs at time step t as
Figure BDA0003853000520000076
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000077
Figure BDA0003853000520000078
is the path taken by the cleaning robot r up to time step t-1,
Figure BDA0003853000520000079
node decoder d with maximum probability output t
The node decoder has the input of
Figure BDA00038530005200000710
Where r' is the node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; node p with the output of the maximum probability t
Optionally, when the initial depth-enhanced learning model for cleaning robot path planning is trained, model parameters of the initial depth-enhanced learning model for cleaning robot path planning are optimized according to the following formula:
Figure BDA00038530005200000711
where θ is the model parameter, s is the output path planning scheme, F s Is the cost of the path planning scheme s, b(s) is the evaluation of the reference method to the path planning scheme s, the strategy of the pi reinforcement learning method, p θ (π | s) represents the probability of outputting the path planning solution s under the parameter θ and the policy π.
In a second aspect of the present invention, a cleaning robot path planning system includes:
the data acquisition module is used for acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload;
and the model calling module is used for calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, so as to obtain a path planning result of each cleaning robot.
In a third aspect of the invention, a computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the cleaning robot path planning method when executing the computer program.
In a fourth aspect of the present invention, a computer-readable storage medium stores a computer program, which when executed by a processor implements the steps of the above cleaning robot path planning method.
Compared with the prior art, the invention has the following beneficial effects:
the cleaning robot path planning method is based on the calling of a deep reinforcement learning model for cleaning robot path planning, can realize the path planning of multiple cleaning robots only by acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of points to be cleaned, the garbage amount and the cleaning workload, can solve the path planning problem of the cleaning robots with multiple robots and a large number of points to be cleaned, is more suitable for practical application scenes, fully utilizes cleaning task information and cleaning robot information, and the solved path planning scheme is superior to the traditional optimization method. Meanwhile, the deep reinforcement learning model for path planning of the cleaning robot is based on deep reinforcement learning, the operation speed can be greatly increased by using a graphic processor, and the operation time required for solving the path planning problem is far shorter than that of the traditional methods such as an ant colony algorithm and a dynamic planning algorithm.
Drawings
Fig. 1 is a flowchart of a cleaning robot path planning method according to an embodiment of the present invention.
FIG. 2 is a diagram of a deep reinforcement learning model architecture according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a detailed architecture of a deep reinforcement learning model according to an embodiment of the present invention.
Fig. 4 is a block diagram of a path planning system of a cleaning robot according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, in an embodiment of the present invention, a cleaning robot path planning method is provided, and particularly, a cleaning robot path planning method based on deep reinforcement learning, which can implement path planning of multiple cleaning robots, and has a fast solving speed and high solving quality.
Specifically, the cleaning robot path planning method comprises the following steps:
s1: and acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload.
S2: and calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, and obtaining a path planning result of each cleaning robot.
The garbage bin capacity and the running speed of each cleaning robot can be obtained from a specification or a manufacturer of the cleaning robot, and the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload are set according to an actual working scene.
The cleaning robot path planning method is based on the calling of a deep reinforcement learning model for cleaning robot path planning, can realize the path planning of multiple cleaning robots only by acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of points to be cleaned, the garbage amount and the cleaning workload, can solve the path planning problem of the cleaning robots with multiple robots and a large number of points to be cleaned, is more suitable for practical application scenes, fully utilizes cleaning task information and cleaning robot information, and the solved path planning scheme is superior to the traditional optimization method. Meanwhile, the deep reinforcement learning model for path planning of the cleaning robot is based on deep reinforcement learning, the operation speed can be greatly increased by using a graphic processor, and the operation time required for solving the path planning problem is far shorter than that of the traditional methods such as an ant colony algorithm and a dynamic planning algorithm.
In one possible embodiment, the deep reinforcement learning model for cleaning robot path planning is constructed by the following steps: establishing a mathematical model of a path planning problem of the cleaning robot; establishing a Markov decision process model of the cleaning robot path planning problem according to a mathematical model of the cleaning robot path planning problem; establishing an initial deep reinforcement learning model for path planning of the cleaning robot according to a Markov decision process model of the path planning problem of the cleaning robot; training an initial deep reinforcement learning model for path planning of the cleaning robot through a preset training set to obtain the deep reinforcement learning model for path planning of the cleaning robot.
Optionally, the mathematical model of the path planning problem of the cleaning robot includes an optimization variable, an optimization target, and constraint conditions, where the optimization variable includes a first optimization variable Y and a second optimization variable Z, and the constraint conditions include an optimization variable value range constraint, a region access frequency constraint, a robot path continuity constraint, a total garbage amount constraint that the robot can carry, and a garbage transportation constraint.
Set up robot storehouse and wait to clean some constitution node set
Figure BDA0003853000520000111
Where n is the number of spots to be cleaned, p 0 Representing the node of the robot library, and setting P' = P- { P 0 P' is a set of n points to be cleaned; set up robot storehouse and wait to clean coordinate composition set of point
Figure BDA0003853000520000112
Wherein x is i Is p i The coordinates of (a); set all cleaning robots to form a set
Figure BDA0003853000520000113
Where k is the number of cleaning robots.
The first optimization variable Y is shown as follows:
Figure BDA0003853000520000114
wherein k is the number of the cleaning robots,
Figure BDA0003853000520000115
to indicate the variable, the indication indicates whether the robot r is from p i Go out and arrive at p j If the robot r is from p i Go out and arrive at p j Then, then
Figure BDA0003853000520000116
Otherwise
Figure BDA0003853000520000117
The second optimization variable Z is shown as follows:
Z={z i,j |i∈P,j∈P}
wherein z is i,j Represents from x i To x j Total amount of waste.
The optimization objective is shown below:
Figure BDA0003853000520000118
wherein, c j Is the point p to be cleaned j Cleaning workload of c 0 =0;v r Is the running speed of the cleaning robot r.
The value range constraint of the optimization variable is shown as the following formula:
Figure BDA0003853000520000119
z i,j ≥0,i∈P,j∈P
the region access times constraint is as follows:
Figure BDA0003853000520000121
the robot path continuity constraint is given by:
Figure BDA0003853000520000122
the total amount of garbage that the robot can carry is constrained as follows:
Figure BDA0003853000520000123
wherein, b r Is the garbage bin capacity of the robot r;
the refuse transport constraint is as follows:
Figure BDA0003853000520000124
Figure BDA0003853000520000125
wherein, g j Is the point p to be cleaned j The quantity of garbage is set as g 0 And M is a larger preset constant number of 0.
Optionally, the markov decision process model of the cleaning robot path planning problem includes environmental states, actions, state transition rules, and costs.
In particular, the ambient state S t =(D t ,E t ),
Figure BDA0003853000520000126
Wherein t is the number of steps,
Figure BDA0003853000520000127
to clean the remaining capacity of the robot r waste bin at step t,
Figure BDA0003853000520000128
in order to clean the node where the robot r is located at the t-th step,
Figure BDA0003853000520000129
a set formed by nodes visited by the cleaning robot r from the t step;
Figure BDA00038530005200001210
to the node p at the t step i If node p has access to state i Has been accessed, then
Figure BDA00038530005200001211
Otherwise
Figure BDA00038530005200001212
Action A t =(d t ,p t ) Wherein d is t For node decoders activated at step t, p t And e P is the node selected in the t step.
State transition rule ST according to action A t From the environmental state S t Transfer to S t+1 The method specifically comprises the following steps:
Figure BDA0003853000520000131
Figure BDA0003853000520000132
Figure BDA0003853000520000133
Figure BDA0003853000520000134
wherein r is t Is node decoder d t A corresponding cleaning robot is arranged on the cleaning machine,
Figure BDA0003853000520000135
represents p is t Is spliced at
Figure BDA0003853000520000136
And (4) ending.
Cost of
Figure BDA0003853000520000137
Wherein T is the total number of steps,
Figure BDA0003853000520000138
is the cost of the robot r at step t,
Figure BDA0003853000520000139
the calculation method of (a) is shown as follows:
Figure BDA00038530005200001310
wherein the content of the first and second substances,
Figure BDA00038530005200001311
indicating points
Figure BDA00038530005200001312
And point
Figure BDA00038530005200001313
The distance of (c).
Referring to fig. 2, optionally, the deep reinforcement learning model for cleaning robot path planning includes: an encoder and a decoder; the encoder comprises a node encoder and a robot encoder, and the decoder comprises a decoder selector and k node decoders; the output ends of the node encoder and the robot encoder are connected with the input end of the decoder selector, and the output end of the decoder selector is connected with the input ends of the k node decoders.
Referring to fig. 3, the components constituting the node encoder, the robot encoder, the decoder selector, and the node decoder include a linear mapping layer, a ReLU function layer, a single-headed attention layer, a multi-headed attention layer, a batch normalization layer, and a graph encoding module. Specifically, the node encoder comprises a linear mapping layer and L1 graph encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l node For the index of the graph coding module of the node encoder, when 1 is less than or equal to l node If < L1, the graph coding module L node Output terminal of (1) and node the input ends of +1 image coding modules are connected, when l node If = L1, the pattern coding module L node The output end of the decoder is connected with the input end of the decoder selector; the robot encoder comprises a linear mapping layer and L2 image encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l rodot For the index of the image coding module of the robot encoder, when 1 is less than or equal to l robot If < L2, the graph coding module L robot Output terminal of (1) and robot the input ends of +1 image coding modules are connected, when l robot If = L2, the pattern coding module L robot The output end of the decoder is connected with the input end of the decoder selector; the decoder selector comprises a multi-head attention layer and a fitness layer, and the output end of the multi-head attention layer is connected with the input end of the fitness layer; the node decoder comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer.
Specifically, the linear mapping layer is represented by the following formula:
Linear(x)=Wx+B
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000141
is the input of the data to be transmitted,
Figure BDA0003853000520000142
and
Figure BDA0003853000520000143
is a learnable parameter, d in Is the dimension of the data input, d out Is the output dimension of the linear mapping layer.
The fitness layer is represented by the following formula:
Figure BDA0003853000520000144
wherein softmax () is a normalized exponential function.
The multi-head attention layer is shown as follows:
MHA(X)=COncat(head 1 ,head 2 ,…,head h )W O
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00038530005200001414
is the input of a multi-head attention layer, nxd x Is the dimension of the input data, concat is the matrix splicing operation,
Figure BDA00038530005200001415
is a trainable parameter, h is the number of attention heads, when h is 1, i.e. single head attention level, d v Is the dimension of the value vector, head i Is the output of the ith attention head; head j The calculation method of (a) is as follows:
Figure BDA0003853000520000151
wherein Q is i =XW i Q ,
Figure BDA00038530005200001514
V i =XW i V
Figure BDA0003853000520000152
And
Figure BDA0003853000520000153
Figure BDA0003853000520000154
is a learnable parameter, d k Is the dimension of the key vector.
The single-headed attention layer is represented by the following formula:
Figure BDA0003853000520000155
wherein Q = XW Q ,K=XW K ,V=XW V
Figure BDA0003853000520000156
And
Figure BDA0003853000520000157
Figure BDA0003853000520000158
are learnable parameters.
The graph encoding module is represented by the following formula:
X l+1 =GraphEncoder(X l )
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000159
is the input to the graph coding module and,
Figure BDA00038530005200001510
is the output of the graph encoding module and,
Figure BDA00038530005200001511
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00038530005200001512
the FF is a forward propagation module and is formed by connecting a plurality of linear mapping layers and a ReLU function layer; BN () is the batch normalization layer.
The ReLU function layer is shown as follows:
ReLU(x)=max(0,x)
the batch normalization layer is shown below:
Figure BDA00038530005200001513
where γ and β are learnable parameters, E [ x ] is the expectation of x, var [ x ] is the variance of x, and E is a constant to prevent the denominator from being zero.
In this embodiment, the forward propagation module is formed by a linear mapping layer with an input dimension of 128 and an output dimension of 512, a ReLU activation function layer, and a linear mapping layer with an input dimension of 512 and an output dimension of 128.
In a possible implementation, the input of the node encoder is node information I P ={(x i ,c i ,g i ) I belongs to P, and the output is
Figure BDA0003853000520000161
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000162
is the code of the ith node; the input of the robot encoder is robot information I R ={(v r ,b r ) L R belongs to R, and the output is
Figure BDA0003853000520000163
Figure BDA0003853000520000164
Wherein the content of the first and second substances,
Figure BDA0003853000520000165
is the code of the ith cleaning robot; the decoder selector has as input at time step t
Figure BDA0003853000520000166
Node decoder d with maximum output probability t (ii) a The node decoder has the input of
Figure BDA0003853000520000167
Where r' is the node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; node p with the output of the maximum probability t
Specifically, the input of the node encoder is
Figure BDA0003853000520000168
The node encoder firstly maps I through a linear mapping layer P Mapping to a high-dimensional feature space:
Figure BDA0003853000520000169
wherein the content of the first and second substances,
Figure BDA00038530005200001610
Linear P is 4 and the output dimension is 128.
Then extracting features through m graph coding modules:
Figure BDA00038530005200001611
wherein k is the serial number of the graph coding module; the output of the node decoder is
Figure BDA00038530005200001612
Wherein
Figure BDA00038530005200001613
Is the code of the ith node and is,
Figure BDA00038530005200001614
the input of the robot encoder is
Figure BDA00038530005200001618
The robot encoder first maps I through the linear mapping layer R Mapping to a high-dimensional feature space:
Figure BDA00038530005200001615
wherein the content of the first and second substances,
Figure BDA00038530005200001616
Linear R has an input dimension of 2 and an output dimension of 128.
Then extracting features through m graph coding modules:
Figure BDA00038530005200001617
the output of the robot encoder is
Figure BDA0003853000520000171
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003853000520000172
is the code of the ith cleaning robot.
The decoder selector has as input at time step t
Figure BDA0003853000520000173
Wherein
Figure BDA0003853000520000174
Figure BDA0003853000520000175
The path that the cleaning robot r has traveled when the time step i-1 is reached;
Figure BDA0003853000520000176
the decoder selector first extracts the Tour by maximum pooling t-1 The information in (1):
Figure BDA0003853000520000177
then, the extracted information is input into a forward propagation module to obtain
Figure BDA0003853000520000178
Figure BDA0003853000520000179
Wherein, FF ST The device is formed by connecting a linear mapping layer with an input dimension of 5 and an output dimension of 128, a linear mapping layer with an input dimension of 128 and an output dimension of 512, a ReLU activation function layer, and a linear mapping layer with an input dimension of 512 and an output dimension of 128.
Then, the V is put t-1 Input into another forward propagation module to obtain
Figure BDA00038530005200001711
Figure BDA00038530005200001712
Wherein, FF ST The device is formed by connecting a linear mapping layer with an input dimension of 640 and an output dimension of 128, a linear mapping layer with an input dimension of 128 and an output dimension of 512, a ReLU activation function layer, and a linear mapping layer with an input dimension of 512 and an output dimension of 128.
Will be provided with
Figure BDA00038530005200001713
And with
Figure BDA00038530005200001714
Splicing and inputting the linear layer to obtain the logarithmic probability:
Figure BDA00038530005200001710
wherein, linear S 256 in input dimension and 5 in output dimension.
Will logits S Inputting softmax function to obtain probability prob of selecting each decoder S
prob S =softmax(logits S )
Wherein the content of the first and second substances,
Figure BDA00038530005200001818
Figure BDA00038530005200001819
representing the probability of selecting decoder i, and finally obtaining node decoder d with the maximum probability t
Figure BDA0003853000520000181
The output of the decoder selector is d t
The node decoder has the input of
Figure BDA0003853000520000182
Wherein r' is a node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; the output is the node p with the maximum probability t
Node decoder first decodes C D Inputting the linear mapping layer to obtain
Figure BDA0003853000520000183
Figure BDA0003853000520000184
Wherein, linear D Is 257 and the output dimension is 128.
Then will
Figure BDA0003853000520000185
And
Figure BDA0003853000520000186
splicing to obtain
Figure BDA0003853000520000187
Figure BDA0003853000520000188
Wherein the content of the first and second substances,
Figure BDA0003853000520000189
will be provided with
Figure BDA00038530005200001810
Inputting the multi-head attention layer to obtain
Figure BDA00038530005200001811
Figure BDA00038530005200001812
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00038530005200001813
then calculating the probability of selecting the ith node
Figure BDA00038530005200001814
Figure BDA00038530005200001815
Wherein the content of the first and second substances,
Figure BDA00038530005200001816
d key is key i Dimension (d); finally, the product is processedGet the node p with the maximum probability t
Figure BDA00038530005200001817
The output of the node decoder is p t
In one possible embodiment, training an initial deep reinforcement learning model for cleaning robot path planning through a preset training set comprises:
s11: and setting the size of the training data set, the size of the batch, the number E of training rounds and the learning rate. In this embodiment, the training data set size is 1280000, the batch size is 512, the number of training rounds E =50, and the learning rate is 0.0001.
S12: generating a training sample set; the current number of training rounds e =1 is set.
S13: inputting training samples into a network in batches according to the set batch size, and calculating a path planning scheme; and optimizing the model parameters according to the path planning scheme output by the network and according to the following formula:
Figure BDA0003853000520000191
where θ is the model parameter, s is the output path planning scheme, F s Is the cost of the path planning scheme s, b(s) is the evaluation of the reference method to the path planning scheme s, the strategy of the pi reinforcement learning method, p θ (π | s) represents the probability of outputting the path planning solution s under the parameter θ and the policy π.
S14: training round number e = e +1.
S15: if E > E, the training is ended; otherwise, return to S12.
In one possible implementation, a test set containing 1280 samples is used, for three benchmark methods based on conventional optimization techniques: ant colony algorithm, genetic algorithm and Gurobi, two reference methods based on reinforcement learning: AM and DRL, and the cleaning robot path planning method of the invention, the results are shown in Table 1:
TABLE 1
Method Optimizing the value of the target Solution time (unit: second)
Ant colony algorithm 7.07 261097
Genetic algorithm 8.85 175670
Gurobi 7.38 129039
AM 7.09 0.63
DRL 6.69 1.21
The invention 6.59 1.27
Therefore, from the perspective of an optimization target, the cleaning robot path planning method is superior to the five reference methods; from the perspective of solving time, the cleaning robot path planning method is obviously superior to three methods based on the traditional optimization technology.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details not disclosed in the device embodiments, reference is made to the method embodiments of the invention.
Referring to fig. 4, in a further embodiment of the present invention, a cleaning robot path planning system is provided, which can be used to implement the cleaning robot path planning method described above, and specifically, the cleaning robot path planning system includes a data obtaining module and a model invoking module.
The data acquisition module is used for acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload; the model calling module is used for calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, so as to obtain a path planning result of each cleaning robot.
In one possible embodiment, the deep reinforcement learning model for cleaning robot path planning is constructed by: establishing a mathematical model of a path planning problem of the cleaning robot; establishing a Markov decision process model of the cleaning robot path planning problem according to a mathematical model of the cleaning robot path planning problem; establishing an initial depth reinforcement learning model for path planning of the cleaning robot according to a Markov decision process model of the path planning problem of the cleaning robot; and training an initial deep reinforcement learning model for path planning of the cleaning robot through a preset training set to obtain the deep reinforcement learning model for path planning of the cleaning robot.
In one possible embodiment, the mathematical model of the cleaning robot path planning problem includes optimization variables, optimization objectives, and constraints; the optimization variables comprise a first optimization variable Y and a second optimization variable Z:
Figure BDA0003853000520000211
Z={z i,j |i∈P,j∈P}
wherein, P is a node set formed by the robot library and the points to be cleaned
Figure BDA0003853000520000216
n is the number of points to be cleaned, p 0 Representing a robot library node; r is a set formed by each cleaning robot
Figure BDA0003853000520000217
k is the number of the cleaning robots,
Figure BDA0003853000520000218
to indicate the variables, indicate whether the cleaning robot r is from p i Go out and arrive at p j If the robot r is from p i Go out and arrive at p j Then, then
Figure BDA0003853000520000219
Otherwise
Figure BDA00038530005200002110
z i,j Is p i From x i To p j Coordinate x of j Total amount of waste.
The optimization objective is shown as follows:
Figure BDA0003853000520000212
wherein, c j Is the point p to be cleaned j Cleaning workload of c 0 =0;v r Is the running speed of the cleaning robot r.
The constraint conditions comprise optimization variable value range constraint, area access frequency constraint, robot path continuity constraint, total garbage amount constraint and garbage transportation constraint which can be carried by the robot.
The value range constraint of the optimization variable is shown as the following formula:
Figure BDA0003853000520000213
z i,j ≥0,i∈P,j∈P
the region access times constraint is as follows:
Figure BDA0003853000520000214
the robot path continuity constraint is given by:
Figure BDA0003853000520000215
the total amount of garbage that the robot can carry is constrained as follows:
Figure BDA0003853000520000221
wherein, b r Is the garbage bin capacity of the cleaning robot r;
the refuse transport constraint is shown by the following formula:
Figure BDA0003853000520000222
Figure BDA0003853000520000223
wherein, P' = P- { P 0 P' is a set of n points to be cleaned, g j Is the point p to be cleaned j Amount of garbage of g 0 =0, m is a preset constant.
In one possible embodiment, the markov decision process model of the cleaning robot path planning problem includes environmental states, actions, state transition rules, and costs.
Wherein the environmental state S t As shown in the following formula:
S t =(D t ,E t ),
Figure BDA0003853000520000224
wherein t is the number of steps,
Figure BDA0003853000520000226
to clean the remaining capacity of the robot r waste bin at step t,
Figure BDA0003853000520000227
in order to clean the node where the robot r is located at the t-th step,
Figure BDA0003853000520000228
a set formed by nodes visited by the cleaning robot r from the t step;
Figure BDA0003853000520000229
to the node p at the t step i If node p has access to state i Has been accessed, then
Figure BDA00038530005200002210
Otherwise
Figure BDA00038530005200002211
Action A t As shown in the following formula:
A t =(d t ,p t )
wherein d is t For node decoders activated at step t, p t And e P is the node selected in the t step.
State transition rules ST for actions according to t The environmental state is changed from S by the following formula t Transfer to S t+1
Figure BDA0003853000520000225
Figure BDA0003853000520000231
Figure BDA0003853000520000232
Figure BDA0003853000520000233
Wherein r is t Is node decoder d t A corresponding cleaning robot is arranged on the cleaning machine,
Figure BDA0003853000520000236
represents p is t Is spliced at
Figure BDA0003853000520000237
And (4) ending.
The cost F is shown as follows:
Figure BDA0003853000520000234
wherein T is the total number of steps,
Figure BDA0003853000520000238
is the cost of the cleaning robot r at step t,
Figure BDA0003853000520000239
obtained by the following formula:
Figure BDA0003853000520000235
wherein the content of the first and second substances,
Figure BDA00038530005200002310
to represent
Figure BDA00038530005200002311
And
Figure BDA00038530005200002312
the distance of (a) to (b),
Figure BDA00038530005200002313
is p t Is determined by the coordinate of (a) in the space,
Figure BDA00038530005200002314
is composed of
Figure BDA00038530005200002315
The coordinates of (a).
In one possible embodiment, the deep reinforcement learning model for cleaning robot path planning includes: an encoder and a decoder; the encoder comprises a node encoder and a robot encoder, and the decoder comprises a decoder selector and k node decoders; the output ends of the node encoder and the robot encoder are connected with the input end of the decoder selector, and the output end of the decoder selector is connected with the input ends of the k node decoders; the node encoder comprises a linear mapping layer and L1 graph encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l node Is the index of the graph coding module of the node encoder, when 1 is less than or equal to l node If < L1, the graph coding module L node Output terminal of (1) and node the input ends of +1 image coding modules are connected, when l node If = L1, the pattern coding module L node The output end of the decoder is connected with the input end of the decoder selector; the robot encoder comprises a linear mapping layer and L2 image encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l robot For the index of the image coding module of the robot encoder, when 1 is less than or equal to l robot If < L2, the graph coding module L robot Output terminal of (1) and robot the input ends of +1 image coding modules are connected, when l robot If = L2, the pattern coding module L robot The output end of the decoder is connected with the input end of the decoder selector; the decoder selector comprises a multi-head attention layer and a fitness layer, and the output end of the multi-head attention layer is connected with the input end of the fitness layer; the node decoder comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer.
In one possible embodiment, the linear mapping layer is represented by the following formula:
Linear(x)=Wx+B
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00038530005200002412
is the input of the data to be transmitted,
Figure BDA00038530005200002413
and
Figure BDA00038530005200002414
is a learnable parameter, d in Is the dimension of the data input, d out Is the output dimension of the linear mapping layer.
The fitness layer is represented by the following formula:
Figure BDA0003853000520000241
wherein softmax () is a normalized exponential function.
The multi-head attention layer is shown as follows:
MHA(X)=Concat(head 1 ,head 2 ,…,head h )W O
wherein the content of the first and second substances,
Figure BDA00038530005200002415
is the input of the multi-head attention layer, nxd x Is the dimension of the input data, concat is the matrix stitchingIn the operation of the method, the operation,
Figure BDA00038530005200002416
is a trainable parameter, h is the number of heads of attention, d v Is the dimension of the value vector, head i Is the output of the ith attention head; head i The calculation method of (a) is as follows:
Figure BDA0003853000520000242
wherein Q i =XW i Q ,
Figure BDA00038530005200002417
V i =XW i V
Figure BDA0003853000520000243
And
Figure BDA0003853000520000244
Figure BDA0003853000520000245
is a learnable parameter, d k Is the dimension of the key vector.
The graph encoding module is represented by the following formula:
X l+1 =GraphEncoder(X l )
wherein the content of the first and second substances,
Figure BDA0003853000520000252
is the input to the graph coding module and,
Figure BDA0003853000520000253
is the output of the graph coding module and,
Figure BDA0003853000520000254
wherein the content of the first and second substances,
Figure BDA0003853000520000255
is a picture plaitThe code module calculates a process vector, and FF is a forward propagation module and is formed by connecting a plurality of linear mapping layers and a ReLU function layer; BN () is a batch normalization layer.
The ReLU function layer is expressed as follows:
ReLU(x)=max(0,x)
the batch normalization layer is shown as follows:
Figure BDA0003853000520000251
where γ and β are learnable parameters, E [ x ] is the expectation of x, var [ x ] is the variance of x, and E is a constant to prevent the denominator from being zero.
The input of the node encoder is I P ={(x i ,c i ,g i ) I belongs to P, and the output is
Figure BDA0003853000520000256
Wherein the content of the first and second substances,
Figure BDA0003853000520000257
is the code of the ith node; the input of the robot encoder is I R ={(v r ,b r ) I R belongs to R, and the output is
Figure BDA0003853000520000258
Wherein the content of the first and second substances,
Figure BDA0003853000520000259
is the code of the ith cleaning robot; the decoder selector has as input at time step t
Figure BDA00038530005200002510
Wherein the content of the first and second substances,
Figure BDA00038530005200002511
Figure BDA00038530005200002512
is cut off to the time step t-1 and the cleaning robot r walksThe path of the beam is a path of the beam,
Figure BDA00038530005200002513
node decoder d with maximum output probability t (ii) a Node decoder d with maximum output probability t (ii) a The node decoder has the input of
Figure BDA00038530005200002514
Where r' is the node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; the output is the node p with the maximum probability t
In one possible embodiment, the training of the initial deep-reinforcement learning model for cleaning robot path planning optimizes model parameters of the initial deep-reinforcement learning model for cleaning robot path planning by:
Figure BDA0003853000520000261
where θ is the model parameter, s is the output path planning scheme, F s Is the cost of the path planning scheme s, b(s) is the evaluation of the reference method to the path planning scheme s, the strategy of the pi reinforcement learning method, p θ (π | s) represents the probability of outputting the path planning solution s under the parameter θ and the policy π.
All relevant contents of each step related to the embodiment of the cleaning robot path planning method can be introduced to the functional description of the functional module corresponding to the cleaning robot path planning system in the embodiment of the present invention, and are not described herein again.
The division of the modules in the embodiments of the present invention is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present invention may be integrated in one processor, or may exist alone physically, or two or more modules are integrated in one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is specifically adapted to load and execute one or more instructions in a computer storage medium to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the cleaning robot path planning method.
In yet another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a computer device and is used for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory. One or more instructions stored in the computer-readable storage medium may be loaded and executed by a processor to perform the corresponding steps of the cleaning robot path planning method in the above-described embodiments.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A method for cleaning robot path planning, comprising:
acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload;
and calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, and obtaining a path planning result of each cleaning robot.
2. The cleaning robot path planning method according to claim 1, wherein the deep reinforcement learning model for cleaning robot path planning is constructed by:
establishing a mathematical model of a path planning problem of the cleaning robot;
establishing a Markov decision process model of the cleaning robot path planning problem according to a mathematical model of the cleaning robot path planning problem;
establishing an initial depth reinforcement learning model for path planning of the cleaning robot according to a Markov decision process model of the path planning problem of the cleaning robot;
and training an initial deep reinforcement learning model for path planning of the cleaning robot through a preset training set to obtain the deep reinforcement learning model for path planning of the cleaning robot.
3. The cleaning robot path planning method according to claim 2, wherein the mathematical model of the cleaning robot path planning problem includes optimization variables, optimization objectives, and constraints;
the optimization variables comprise a first optimization variable Y and a second optimization variable Z:
Figure FDA0003853000510000011
Z={z i,j |i∈P,j∈P}
wherein, P is a node set formed by the robot library and the points to be cleaned
Figure FDA0003853000510000012
n is the number of points to be cleaned, p 0 Representing a robot library node; r is a set formed by each cleaning robot
Figure FDA0003853000510000013
k is the number of the cleaning robots,
Figure FDA0003853000510000021
to indicate the variables, indicate whether the cleaning robot r is from p i Go out and arrive at p j If the robot r is from p i Go out and arrive at p j Then, then
Figure FDA0003853000510000022
Otherwise
Figure FDA0003853000510000023
z i,j Is p i From x i To p j Coordinate x of j Total amount of waste;
the optimization objective is shown as follows:
Figure FDA0003853000510000024
wherein, c j Is the point p to be cleaned j Cleaning workload of c 0 =0;v r Is the running speed of the cleaning robot r;
the constraint conditions comprise an optimized variable value range constraint, a region access frequency constraint, a robot path continuity constraint, a robot carried garbage total amount constraint and a garbage transportation constraint;
the value range constraint of the optimization variable is shown as the following formula:
Figure FDA0003853000510000025
z i,j ≥0,i∈P,j∈P
the region access times constraint is as follows:
Figure FDA0003853000510000026
the robot path continuity constraint is given by:
Figure FDA0003853000510000027
the total amount of garbage that the robot can carry is constrained as follows:
Figure FDA0003853000510000028
wherein, b r Is the garbage bin capacity of the cleaning robot r;
the refuse transport constraint is as follows:
Figure FDA0003853000510000031
Figure FDA0003853000510000032
wherein, P' = P- { P 0 P' is a set of n points to be cleaned, g j Is the point p to be cleaned j Amount of garbage of g 0 =0, m is a preset constant.
4. The cleaning robot path planning method of claim 3, wherein the Markov decision process model of the cleaning robot path planning problem includes environmental states, actions, state transition rules, and costs;
wherein the environmental state S t As shown in the following formula:
Figure FDA0003853000510000033
wherein t is the number of steps,
Figure FDA0003853000510000034
to clean the remaining capacity of the robot r waste bin at step t,
Figure FDA0003853000510000035
in order to clean the node where the robot r is located at the t-th step,
Figure FDA0003853000510000036
a set formed by nodes visited by the cleaning robot r from the t step;
Figure FDA0003853000510000037
to be at the t-th nodep i If node p has access to state i Has been accessed, then
Figure FDA0003853000510000038
Otherwise
Figure FDA0003853000510000039
Action A t As shown in the following formula:
A t =(d t ,p t )
wherein, d t For node decoders activated at step t, p t E, P is the node selected in the t step;
state transition rules ST for actions according to A t The environmental state is changed from S by the following formula t Transfer to S t+1
Figure FDA00038530005100000310
Figure FDA00038530005100000311
Figure FDA00038530005100000312
Figure FDA0003853000510000041
Wherein r is t Is a node decoder s t A corresponding cleaning robot is arranged on the base plate,
Figure FDA0003853000510000042
represents p is t Is spliced at
Figure FDA0003853000510000043
A terminal end;
the cost F is shown below:
Figure FDA0003853000510000044
wherein, T is the total number of steps,
Figure FDA0003853000510000045
is the cost of the cleaning robot r at step t,
Figure FDA0003853000510000046
obtained by the following formula:
Figure FDA0003853000510000047
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003853000510000048
represent
Figure FDA0003853000510000049
And
Figure FDA00038530005100000410
the distance of (a) to (b),
Figure FDA00038530005100000411
is p t Is determined by the coordinate of (a) in the space,
Figure FDA00038530005100000412
is composed of
Figure FDA00038530005100000413
The coordinates of (a).
5. The cleaning robot path planning method according to claim 4, wherein the deep reinforcement learning model for cleaning robot path planning includes: an encoder and a decoder; the encoder comprises a node encoder and a robot encoder, and the decoder comprises a decoder selector and k node decoders; the output ends of the node encoder and the robot encoder are connected with the input end of a decoder selector, and the output end of the decoder selector is connected with the input ends of the k node decoders;
the node encoder comprises a linear mapping layer and L1 graph encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; is provided with
Figure FDA00038530005100000414
For the index of the graph coding module of the node encoder, when
Figure FDA00038530005100000415
Time, picture coding module
Figure FDA00038530005100000416
To the output terminal and
Figure FDA00038530005100000422
the input end of the picture coding module is connected with
Figure FDA00038530005100000423
Time-graph coding module
Figure FDA00038530005100000417
The output end of the decoder is connected with the input end of the decoder selector; the robot encoder comprises a linear mapping layer and L2 image encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; is provided with
Figure FDA00038530005100000418
For the indexing of the image coding modules of the robot encoder, when
Figure FDA00038530005100000419
Time-graph coding module
Figure FDA00038530005100000420
Output terminal and the second
Figure FDA00038530005100000421
The input end of the picture coding module is connected with
Figure FDA00038530005100000511
Time-graph coding module
Figure FDA00038530005100000512
The output end of the decoder is connected with the input end of the decoder selector; the decoder selector comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer; the node decoder comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer.
6. The cleaning robot path planning method according to claim 5, wherein the linear mapping layer is represented by the following formula:
Linear(x)=Wx+B
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003853000510000051
is the input of the data to be transmitted,
Figure FDA0003853000510000052
and
Figure FDA0003853000510000053
is a learnable parameter, d in Is the dimension of the data input, d out Is the output dimension of the linear mapping layer;
the fitness layer is represented by the following formula:
Figure FDA0003853000510000054
wherein sofmtx () is a normalized exponential function;
the multi-head attention layer is shown as follows:
MHA(X)=Concat(head 1 ,head 2 ,…,head h )W O
wherein the content of the first and second substances,
Figure FDA0003853000510000055
is the input of the multi-head attention layer, nxd x Is the dimension of the input data, concat is the matrix splicing operation,
Figure FDA0003853000510000056
is a trainable parameter, h is the number of attention heads, d v Is the dimension of the value vector, head i Is the output of the ith attention head; head i The calculation method of (a) is as follows:
Figure FDA0003853000510000057
wherein Q i =XW i Q ,K i =XW i K ,V i =XW i V
Figure FDA0003853000510000058
And
Figure FDA0003853000510000059
Figure FDA00038530005100000510
is a learnable parameter, d k Is the dimension of the key vector;
the graph encoding module is represented by the following formula:
X l+1 =GraphEncoder(X l )
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003853000510000061
is the input to the graph coding module and,
Figure FDA0003853000510000062
is the output of the graph encoding module and,
Figure FDA0003853000510000063
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003853000510000064
the FF is a forward propagation module and is formed by connecting a plurality of linear mapping layers and a ReLU function layer; BN () is a batch normalization layer;
the ReLU function layer is expressed as follows:
ReLU(x)=max(0,x)
the batch normalization layer is shown below:
Figure FDA0003853000510000065
where γ and β are learnable parameters, E [ x ] is the expectation of x, var [ x ] is the variance of x, and E is a constant to prevent the denominator from being zero;
the input of the node encoder is I P ={(x i ,c i ,g i ) I belongs to P, and the output is
Figure FDA0003853000510000066
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003853000510000067
is the code of the ith node;
the input of the robot encoder is I R ={(v r ,b r ) I R belongs to R, and the output is
Figure FDA0003853000510000068
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003853000510000069
is the code of the ith cleaning robot;
the decoder selector has as input at time step t
Figure FDA00038530005100000610
Wherein the content of the first and second substances,
Figure FDA00038530005100000611
Figure FDA00038530005100000612
is the path taken by the cleaning robot r up to time step t-1,
Figure FDA00038530005100000613
node decoder d with maximum output probability t
The node decoder has the input of
Figure FDA00038530005100000614
Wherein r' is a node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; the output is the node p with the maximum probability t
7. The cleaning robot path planning method according to claim 2, wherein the training of the initial deep reinforcement learning model for cleaning robot path planning optimizes model parameters of the initial deep reinforcement learning model for cleaning robot path planning by:
Figure FDA0003853000510000071
where θ is the model parameter, s is the output path planning scheme, F s Is the cost of the path planning scheme s, b(s) is the evaluation of the reference method to the path planning scheme s, the strategy of the pi reinforcement learning method, p θ (π | s) represents the probability of outputting a path planning solution s under the parameter θ and the policy π.
8. A cleaning robot path planning system, comprising:
the data acquisition module is used for acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload;
and the model calling module is used for calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, so as to obtain a path planning result of each cleaning robot.
9. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of the method for path planning of a cleaning robot according to any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method for path planning for a cleaning robot according to any of claims 1 to 7.
CN202211147813.4A 2022-09-19 2022-09-19 Cleaning robot path planning method, system, computer device and storage medium Pending CN115421494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211147813.4A CN115421494A (en) 2022-09-19 2022-09-19 Cleaning robot path planning method, system, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211147813.4A CN115421494A (en) 2022-09-19 2022-09-19 Cleaning robot path planning method, system, computer device and storage medium

Publications (1)

Publication Number Publication Date
CN115421494A true CN115421494A (en) 2022-12-02

Family

ID=84204837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211147813.4A Pending CN115421494A (en) 2022-09-19 2022-09-19 Cleaning robot path planning method, system, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN115421494A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115648255A (en) * 2022-12-15 2023-01-31 深圳市思傲拓科技有限公司 Clean path planning management system and method for swimming pool decontamination robot

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115648255A (en) * 2022-12-15 2023-01-31 深圳市思傲拓科技有限公司 Clean path planning management system and method for swimming pool decontamination robot

Similar Documents

Publication Publication Date Title
Gobeyn et al. Evolutionary algorithms for species distribution modelling: A review in the context of machine learning
US4697242A (en) Adaptive computing system capable of learning and discovery
Kumar et al. Genetic algorithm: Review and application
CN110222164A (en) A kind of Question-Answering Model training method, problem sentence processing method, device and storage medium
Huizinga et al. Evolving multimodal robot behavior via many stepping stones with the combinatorial multiobjective evolutionary algorithm
Liu et al. Global maximum likelihood estimation procedure for multinomial probit (MNP) model parameters
CN115421494A (en) Cleaning robot path planning method, system, computer device and storage medium
CN107688909A (en) A kind of automatic yard dispatching method and system based on genetic algorithm
Bhar et al. Era of artificial intelligence: Prospects for IndianAgriculture
CN116690589B (en) Robot U-shaped dismantling line dynamic balance method based on deep reinforcement learning
CN110868221A (en) Multi-mode data automatic compression method
Ye et al. Efficient robotic object search via hiem: Hierarchical policy learning with intrinsic-extrinsic modeling
Salehi et al. Few-shot quality-diversity optimization
Gupta et al. Solving time varying many-objective TSP with dynamic θ-NSGA-III algorithm
CN114463596A (en) Small sample image identification method, device and equipment of hypergraph neural network
CN111079888B (en) Water quality dissolved oxygen prediction method and system based on hybrid QPSO-DE optimization
Lee et al. A genetic algorithm based robust learning credit assignment cerebellar model articulation controller
Remya An adaptive neuro-fuzzy inference system to monitor and manage the soil quality to improve sustainable farming in agriculture
CN109492744A (en) A kind of mixed running optimal control method that discrete binary particle swarm algorithm is coupled with fuzzy control
Charansiriphaisan et al. A comparative study of improved artificial bee colony algorithms applied to multilevel image thresholding
CN115293623A (en) Training method and device for production scheduling model, electronic equipment and medium
CN113111729B (en) Training method, recognition method, system, device and medium for personnel recognition model
Li et al. Evaluation of frameworks that combine evolution and learning to design robots in complex morphological spaces
CN112036566A (en) Method and apparatus for feature selection using genetic algorithm
Furze et al. Mathematical methods to quantify and characterise the primary elements of trophic systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination