CN115421494A - Cleaning robot path planning method, system, computer device and storage medium - Google Patents
Cleaning robot path planning method, system, computer device and storage medium Download PDFInfo
- Publication number
- CN115421494A CN115421494A CN202211147813.4A CN202211147813A CN115421494A CN 115421494 A CN115421494 A CN 115421494A CN 202211147813 A CN202211147813 A CN 202211147813A CN 115421494 A CN115421494 A CN 115421494A
- Authority
- CN
- China
- Prior art keywords
- cleaning robot
- path planning
- robot
- node
- cleaning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 224
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000003860 storage Methods 0.000 title claims abstract description 21
- 230000002787 reinforcement Effects 0.000 claims abstract description 44
- 238000005457 optimization Methods 0.000 claims abstract description 40
- 238000013507 mapping Methods 0.000 claims description 42
- 230000006870 function Effects 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 21
- 239000000126 substance Substances 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 12
- 238000013178 mathematical model Methods 0.000 claims description 12
- 230000007613 environmental effect Effects 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 9
- 230000007704 transition Effects 0.000 claims description 8
- 239000002699 waste material Substances 0.000 claims description 8
- 238000007430 reference method Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000012546 transfer Methods 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 abstract description 11
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009313 farming Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Manipulator (AREA)
Abstract
The invention belongs to the field of artificial intelligence and robot path planning, and discloses a cleaning robot path planning method, a cleaning robot path planning system, computer equipment and a storage medium, wherein the cleaning robot path planning system comprises the following components: acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload; and calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, and obtaining a path planning result of each cleaning robot. The path planning method can realize path planning of multiple cleaning robots, solve the path planning problem of the cleaning robots with multiple robots and a large number of points to be cleaned, fit practical application scenes, obtain a path planning scheme superior to that of a traditional optimization method, and solve the path planning problem by far shorter operation time than traditional methods such as an ant colony algorithm and a dynamic planning algorithm.
Description
Technical Field
The invention belongs to the field of artificial intelligence and robot path planning, and relates to a cleaning robot path planning method, a cleaning robot path planning system, computer equipment and a storage medium.
Background
The vigorous development of artificial intelligence and robot technology provides a prerequisite for the large-scale application of the cleaning robot, the continuous rise of labor cost and actual landing market space for the cleaning robot. Nowadays, the figure of the cleaning robot is seen in a small space such as a home house from a large public place such as an airport, a hospital, and a school. Obviously, it has become a trend of the times to complete cleaning work by a robot instead of a human.
The path planning must be performed before the robot starts to perform the cleaning task. The quality of the path planning directly affects the efficiency of completing the cleaning task and indirectly affects the energy consumption and wear rate of each robot. Existing path planning methods fall into two categories: the first type is a full-coverage path planning method represented by a cattle farming method, which enables a robot to traverse all cleaning areas according to some preset rules, and is simple to implement, but low in efficiency under the conditions of large cleaning space and sparse garbage distribution. The second type is a path planning method based on a traditional optimization technology represented by an ant colony algorithm, dynamic planning, gurobi and the like, the solving time of the method generally has an exponential relation with the number of path nodes and robots, and the method is not suitable for solving a large-scale multi-machine path planning problem.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned disadvantage of the prior art that multi-robot path planning of a cleaning robot is difficult, and provides a method, a system, a computer device and a storage medium for path planning of a cleaning robot.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
in a first aspect of the present invention, a cleaning robot path planning method includes:
acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload;
and calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, and obtaining a path planning result of each cleaning robot.
Optionally, the deep reinforcement learning model for path planning of the cleaning robot is constructed in the following manner:
establishing a mathematical model of a path planning problem of the cleaning robot;
establishing a Markov decision process model of the cleaning robot path planning problem according to a mathematical model of the cleaning robot path planning problem;
establishing an initial deep reinforcement learning model for path planning of the cleaning robot according to a Markov decision process model of the path planning problem of the cleaning robot;
and training an initial deep reinforcement learning model for path planning of the cleaning robot through a preset training set to obtain the deep reinforcement learning model for path planning of the cleaning robot.
Optionally, the mathematical model of the cleaning robot path planning problem includes optimization variables, optimization objectives, and constraint conditions;
wherein the optimization variables comprise a first optimization variable Y and a second optimization variable Z:
Z={z i,j |i∈P,j∈P}
wherein P is a node set formed by the robot library and the points to be cleanedn is the number of points to be cleaned, p 0 Representing a robot library node; r is a set of cleaning robotsk is the number of the cleaning robots,to indicate the variables, indicate whether the cleaning robot r is from p i Go out and arrive at p j If the robot r is from p i Go out and arrive at p j Then, thenOtherwisez i,j Is p i From x i To p j Coordinate x of (2) j Total amount of waste of (a);
the optimization objective is shown as follows:
wherein, c j Is the point p to be cleaned j Cleaning workload of c 0 =0;v r Is the running speed of the cleaning robot r;
the constraint conditions comprise optimization variable value range constraint, region access frequency constraint, robot path continuity constraint, total garbage amount constraint and garbage transportation constraint which can be carried by the robot;
the value range constraint of the optimization variable is shown as the following formula:
z i,j ≥0,i∈P,j∈P
the region access times constraint is as follows:
the robot path continuity constraint is given by:
the total amount of garbage that the robot can carry is constrained as follows:
wherein, b r Is the garbage bin capacity of the cleaning robot r;
the refuse transport constraint is as follows:
wherein, P' = P- { P 0 P' is a set of n points to be cleaned, g j Is the point p to be cleaned j Amount of garbage of g 0 =0, m is a preset constant.
Optionally, the markov decision process model of the cleaning robot path planning problem includes an environmental state, an action, a state transition rule, and a cost;
wherein the environmental state S t As shown in the following formula:
wherein, t is the number of steps,to clean the remaining capacity of the robot r waste bin at step t,in order to clean the node where the robot r is located at the t-th step,a set formed by nodes visited by the cleaning robot r up to the t step;to node p at the t-th step i If node p has access to state i Has been accessed, thenOtherwise
Action A t As shown in the following formula:
A t =(d t ,p t )
wherein, d t For node decoders activated at step t, p t E, P is the node selected in the t step;
state transition rules ST for actions according to t The environmental state is changed from S by the following formula t Transfer to S t+1 :
Wherein r is t Is node decoder d t A corresponding cleaning robot is arranged on the cleaning machine,represents p is t Is spliced atA terminal end;
the cost F is shown as follows:
wherein T is the total number of steps,is the cost of the cleaning robot r at step t,obtained by the following formula:
wherein, the first and the second end of the pipe are connected with each other,to representAndthe distance of (a) to (b),is p t The coordinates of (a) are calculated,is composed ofThe coordinates of (c).
Optionally, the deep reinforcement learning model for cleaning robot path planning includes: an encoder and a decoder; the encoder comprises a node encoder and a robot encoder, and the decoder comprises a decoder selector and k node decoders; the output ends of the node encoder and the robot encoder are connected with the input end of a decoder selector, and the output end of the decoder selector is connected with the input ends of the k node decoders;
the node encoder comprises a linear mapping layer and L1 graph encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l node Is the index of the graph coding module of the node encoder, when 1 is less than or equal to l node If < L1, the graph coding module L node Output terminal of (1) and node the input ends of +1 image coding modules are connected, when l node If = L1, the pattern coding module L node The output end of the decoder is connected with the input end of the decoder selector; the robot encoder comprises a linear mapping layer and L2 image encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l robot For the index of the image coding module of the robot encoder, when 1 is less than or equal to l robot If < L2, the graph coding module L robot Output terminal of (1) and robot the input ends of +1 image coding modules are connected, when l robot = L2, graph coding module L robot The output end of the decoder is connected with the input end of the decoder selector; the decoder selector comprises a multi-head attention layer and a fitness layer, and the output end of the multi-head attention layer is connected with the input end of the fitness layer; the node decoder comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer.
Optionally, the linear mapping layer is shown as follows:
Linear(x)=Wx+B
wherein the content of the first and second substances,is the input of the data to be transmitted,andis a learnable parameter, d in Is the dimension of the data input, d out Is the output dimension of the linear mapping layer;
the fitness layer is represented by the following formula:
wherein softmax () is a normalized exponential function;
the multiheaded attention layer is represented by the formula:
MHA(X)=Concat(head 1 ,head 2 ,…,head h )W O
wherein, the first and the second end of the pipe are connected with each other,is the input of the multi-head attention layer, nxd x Is the dimension of the input data, concat is the matrix splicing operation,is a trainable parameter, h is the number of heads of attention, d v Is the dimension of a vector of values, head i Is the output of the ith attention head; head i The calculation method of (a) is as follows:
wherein Q is i =XWi i Q ,V i =XW i V ;And is a learnable parameter, d k Is the dimension of the key vector;
the graph encoding module is represented by the following formula:
X l+1 =GraphEncoder(X l )
wherein the content of the first and second substances,is the input to the graph coding module and,is the output of the graph coding module and,wherein, the first and the second end of the pipe are connected with each other,the method comprises the steps that a graph coding module calculates a process vector, and FF is a forward propagation module and is formed by connecting a plurality of linear mapping layers and a ReLU function layer; BN () is a batch normalization layer;
the ReLU function layer is expressed as follows:
ReLU(x)=max(0,x)
the batch normalization layer is shown below:
where γ and β are learnable parameters, E [ x ] is the expectation of x, var [ x ] is the variance of x, and E is a constant to prevent the denominator from being zero;
the input of the node encoder is I P ={(x i ,c i ,g i ) I belongs to P, and the output isWherein, the first and the second end of the pipe are connected with each other,is the code of the ith node;
the input of the robot encoder is I R ={(v r ,b r ) L R belongs to R, and the output isWherein, the first and the second end of the pipe are connected with each other,is the code of the ith cleaning robot;
the decoder selector inputs at time step t asWherein, the first and the second end of the pipe are connected with each other, is the path taken by the cleaning robot r up to time step t-1,node decoder d with maximum probability output t ;
The node decoder has the input ofWhere r' is the node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; node p with the output of the maximum probability t 。
Optionally, when the initial depth-enhanced learning model for cleaning robot path planning is trained, model parameters of the initial depth-enhanced learning model for cleaning robot path planning are optimized according to the following formula:
where θ is the model parameter, s is the output path planning scheme, F s Is the cost of the path planning scheme s, b(s) is the evaluation of the reference method to the path planning scheme s, the strategy of the pi reinforcement learning method, p θ (π | s) represents the probability of outputting the path planning solution s under the parameter θ and the policy π.
In a second aspect of the present invention, a cleaning robot path planning system includes:
the data acquisition module is used for acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload;
and the model calling module is used for calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, so as to obtain a path planning result of each cleaning robot.
In a third aspect of the invention, a computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the cleaning robot path planning method when executing the computer program.
In a fourth aspect of the present invention, a computer-readable storage medium stores a computer program, which when executed by a processor implements the steps of the above cleaning robot path planning method.
Compared with the prior art, the invention has the following beneficial effects:
the cleaning robot path planning method is based on the calling of a deep reinforcement learning model for cleaning robot path planning, can realize the path planning of multiple cleaning robots only by acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of points to be cleaned, the garbage amount and the cleaning workload, can solve the path planning problem of the cleaning robots with multiple robots and a large number of points to be cleaned, is more suitable for practical application scenes, fully utilizes cleaning task information and cleaning robot information, and the solved path planning scheme is superior to the traditional optimization method. Meanwhile, the deep reinforcement learning model for path planning of the cleaning robot is based on deep reinforcement learning, the operation speed can be greatly increased by using a graphic processor, and the operation time required for solving the path planning problem is far shorter than that of the traditional methods such as an ant colony algorithm and a dynamic planning algorithm.
Drawings
Fig. 1 is a flowchart of a cleaning robot path planning method according to an embodiment of the present invention.
FIG. 2 is a diagram of a deep reinforcement learning model architecture according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a detailed architecture of a deep reinforcement learning model according to an embodiment of the present invention.
Fig. 4 is a block diagram of a path planning system of a cleaning robot according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, in an embodiment of the present invention, a cleaning robot path planning method is provided, and particularly, a cleaning robot path planning method based on deep reinforcement learning, which can implement path planning of multiple cleaning robots, and has a fast solving speed and high solving quality.
Specifically, the cleaning robot path planning method comprises the following steps:
s1: and acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload.
S2: and calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, and obtaining a path planning result of each cleaning robot.
The garbage bin capacity and the running speed of each cleaning robot can be obtained from a specification or a manufacturer of the cleaning robot, and the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload are set according to an actual working scene.
The cleaning robot path planning method is based on the calling of a deep reinforcement learning model for cleaning robot path planning, can realize the path planning of multiple cleaning robots only by acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of points to be cleaned, the garbage amount and the cleaning workload, can solve the path planning problem of the cleaning robots with multiple robots and a large number of points to be cleaned, is more suitable for practical application scenes, fully utilizes cleaning task information and cleaning robot information, and the solved path planning scheme is superior to the traditional optimization method. Meanwhile, the deep reinforcement learning model for path planning of the cleaning robot is based on deep reinforcement learning, the operation speed can be greatly increased by using a graphic processor, and the operation time required for solving the path planning problem is far shorter than that of the traditional methods such as an ant colony algorithm and a dynamic planning algorithm.
In one possible embodiment, the deep reinforcement learning model for cleaning robot path planning is constructed by the following steps: establishing a mathematical model of a path planning problem of the cleaning robot; establishing a Markov decision process model of the cleaning robot path planning problem according to a mathematical model of the cleaning robot path planning problem; establishing an initial deep reinforcement learning model for path planning of the cleaning robot according to a Markov decision process model of the path planning problem of the cleaning robot; training an initial deep reinforcement learning model for path planning of the cleaning robot through a preset training set to obtain the deep reinforcement learning model for path planning of the cleaning robot.
Optionally, the mathematical model of the path planning problem of the cleaning robot includes an optimization variable, an optimization target, and constraint conditions, where the optimization variable includes a first optimization variable Y and a second optimization variable Z, and the constraint conditions include an optimization variable value range constraint, a region access frequency constraint, a robot path continuity constraint, a total garbage amount constraint that the robot can carry, and a garbage transportation constraint.
Set up robot storehouse and wait to clean some constitution node setWhere n is the number of spots to be cleaned, p 0 Representing the node of the robot library, and setting P' = P- { P 0 P' is a set of n points to be cleaned; set up robot storehouse and wait to clean coordinate composition set of pointWherein x is i Is p i The coordinates of (a); set all cleaning robots to form a setWhere k is the number of cleaning robots.
The first optimization variable Y is shown as follows:
wherein k is the number of the cleaning robots,to indicate the variable, the indication indicates whether the robot r is from p i Go out and arrive at p j If the robot r is from p i Go out and arrive at p j Then, thenOtherwise
The second optimization variable Z is shown as follows:
Z={z i,j |i∈P,j∈P}
wherein z is i,j Represents from x i To x j Total amount of waste.
The optimization objective is shown below:
wherein, c j Is the point p to be cleaned j Cleaning workload of c 0 =0;v r Is the running speed of the cleaning robot r.
The value range constraint of the optimization variable is shown as the following formula:
z i,j ≥0,i∈P,j∈P
the region access times constraint is as follows:
the robot path continuity constraint is given by:
the total amount of garbage that the robot can carry is constrained as follows:
wherein, b r Is the garbage bin capacity of the robot r;
the refuse transport constraint is as follows:
wherein, g j Is the point p to be cleaned j The quantity of garbage is set as g 0 And M is a larger preset constant number of 0.
Optionally, the markov decision process model of the cleaning robot path planning problem includes environmental states, actions, state transition rules, and costs.
In particular, the ambient state S t =(D t ,E t ),Wherein t is the number of steps,to clean the remaining capacity of the robot r waste bin at step t,in order to clean the node where the robot r is located at the t-th step,a set formed by nodes visited by the cleaning robot r from the t step;to the node p at the t step i If node p has access to state i Has been accessed, thenOtherwise
Action A t =(d t ,p t ) Wherein d is t For node decoders activated at step t, p t And e P is the node selected in the t step.
State transition rule ST according to action A t From the environmental state S t Transfer to S t+1 The method specifically comprises the following steps:
wherein r is t Is node decoder d t A corresponding cleaning robot is arranged on the cleaning machine,represents p is t Is spliced atAnd (4) ending.
Cost ofWherein T is the total number of steps,is the cost of the robot r at step t,the calculation method of (a) is shown as follows:
wherein the content of the first and second substances,indicating pointsAnd pointThe distance of (c).
Referring to fig. 2, optionally, the deep reinforcement learning model for cleaning robot path planning includes: an encoder and a decoder; the encoder comprises a node encoder and a robot encoder, and the decoder comprises a decoder selector and k node decoders; the output ends of the node encoder and the robot encoder are connected with the input end of the decoder selector, and the output end of the decoder selector is connected with the input ends of the k node decoders.
Referring to fig. 3, the components constituting the node encoder, the robot encoder, the decoder selector, and the node decoder include a linear mapping layer, a ReLU function layer, a single-headed attention layer, a multi-headed attention layer, a batch normalization layer, and a graph encoding module. Specifically, the node encoder comprises a linear mapping layer and L1 graph encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l node For the index of the graph coding module of the node encoder, when 1 is less than or equal to l node If < L1, the graph coding module L node Output terminal of (1) and node the input ends of +1 image coding modules are connected, when l node If = L1, the pattern coding module L node The output end of the decoder is connected with the input end of the decoder selector; the robot encoder comprises a linear mapping layer and L2 image encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l rodot For the index of the image coding module of the robot encoder, when 1 is less than or equal to l robot If < L2, the graph coding module L robot Output terminal of (1) and robot the input ends of +1 image coding modules are connected, when l robot If = L2, the pattern coding module L robot The output end of the decoder is connected with the input end of the decoder selector; the decoder selector comprises a multi-head attention layer and a fitness layer, and the output end of the multi-head attention layer is connected with the input end of the fitness layer; the node decoder comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer.
Specifically, the linear mapping layer is represented by the following formula:
Linear(x)=Wx+B
wherein, the first and the second end of the pipe are connected with each other,is the input of the data to be transmitted,andis a learnable parameter, d in Is the dimension of the data input, d out Is the output dimension of the linear mapping layer.
The fitness layer is represented by the following formula:
wherein softmax () is a normalized exponential function.
The multi-head attention layer is shown as follows:
MHA(X)=COncat(head 1 ,head 2 ,…,head h )W O
wherein, the first and the second end of the pipe are connected with each other,is the input of a multi-head attention layer, nxd x Is the dimension of the input data, concat is the matrix splicing operation,is a trainable parameter, h is the number of attention heads, when h is 1, i.e. single head attention level, d v Is the dimension of the value vector, head i Is the output of the ith attention head; head j The calculation method of (a) is as follows:
wherein Q is i =XW i Q ,V i =XW i V ;And is a learnable parameter, d k Is the dimension of the key vector.
The single-headed attention layer is represented by the following formula:
The graph encoding module is represented by the following formula:
X l+1 =GraphEncoder(X l )
wherein, the first and the second end of the pipe are connected with each other,is the input to the graph coding module and,is the output of the graph encoding module and,wherein, the first and the second end of the pipe are connected with each other,the FF is a forward propagation module and is formed by connecting a plurality of linear mapping layers and a ReLU function layer; BN () is the batch normalization layer.
The ReLU function layer is shown as follows:
ReLU(x)=max(0,x)
the batch normalization layer is shown below:
where γ and β are learnable parameters, E [ x ] is the expectation of x, var [ x ] is the variance of x, and E is a constant to prevent the denominator from being zero.
In this embodiment, the forward propagation module is formed by a linear mapping layer with an input dimension of 128 and an output dimension of 512, a ReLU activation function layer, and a linear mapping layer with an input dimension of 512 and an output dimension of 128.
In a possible implementation, the input of the node encoder is node information I P ={(x i ,c i ,g i ) I belongs to P, and the output isWherein, the first and the second end of the pipe are connected with each other,is the code of the ith node; the input of the robot encoder is robot information I R ={(v r ,b r ) L R belongs to R, and the output is Wherein the content of the first and second substances,is the code of the ith cleaning robot; the decoder selector has as input at time step tNode decoder d with maximum output probability t (ii) a The node decoder has the input ofWhere r' is the node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; node p with the output of the maximum probability t 。
Specifically, the input of the node encoder isThe node encoder firstly maps I through a linear mapping layer P Mapping to a high-dimensional feature space:
wherein the content of the first and second substances,Linear P is 4 and the output dimension is 128.
Then extracting features through m graph coding modules:
wherein k is the serial number of the graph coding module; the output of the node decoder isWhereinIs the code of the ith node and is,
the input of the robot encoder isThe robot encoder first maps I through the linear mapping layer R Mapping to a high-dimensional feature space:
wherein the content of the first and second substances,Linear R has an input dimension of 2 and an output dimension of 128.
Then extracting features through m graph coding modules:
the output of the robot encoder isWherein, the first and the second end of the pipe are connected with each other,is the code of the ith cleaning robot.
The decoder selector has as input at time step tWherein The path that the cleaning robot r has traveled when the time step i-1 is reached;the decoder selector first extracts the Tour by maximum pooling t-1 The information in (1):
Wherein, FF ST The device is formed by connecting a linear mapping layer with an input dimension of 5 and an output dimension of 128, a linear mapping layer with an input dimension of 128 and an output dimension of 512, a ReLU activation function layer, and a linear mapping layer with an input dimension of 512 and an output dimension of 128.
Wherein, FF ST The device is formed by connecting a linear mapping layer with an input dimension of 640 and an output dimension of 128, a linear mapping layer with an input dimension of 128 and an output dimension of 512, a ReLU activation function layer, and a linear mapping layer with an input dimension of 512 and an output dimension of 128.
Will be provided withAnd withSplicing and inputting the linear layer to obtain the logarithmic probability:
wherein, linear S 256 in input dimension and 5 in output dimension.
Will logits S Inputting softmax function to obtain probability prob of selecting each decoder S :
prob S =softmax(logits S )
Wherein the content of the first and second substances, representing the probability of selecting decoder i, and finally obtaining node decoder d with the maximum probability t :
The output of the decoder selector is d t 。
The node decoder has the input ofWherein r' is a node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; the output is the node p with the maximum probability t 。
Wherein, linear D Is 257 and the output dimension is 128.
Wherein the content of the first and second substances,will be provided withInputting the multi-head attention layer to obtain
Wherein, the first and the second end of the pipe are connected with each other,then calculating the probability of selecting the ith node
Wherein the content of the first and second substances,d key is key i Dimension (d); finally, the product is processedGet the node p with the maximum probability t :
The output of the node decoder is p t 。
In one possible embodiment, training an initial deep reinforcement learning model for cleaning robot path planning through a preset training set comprises:
s11: and setting the size of the training data set, the size of the batch, the number E of training rounds and the learning rate. In this embodiment, the training data set size is 1280000, the batch size is 512, the number of training rounds E =50, and the learning rate is 0.0001.
S12: generating a training sample set; the current number of training rounds e =1 is set.
S13: inputting training samples into a network in batches according to the set batch size, and calculating a path planning scheme; and optimizing the model parameters according to the path planning scheme output by the network and according to the following formula:
where θ is the model parameter, s is the output path planning scheme, F s Is the cost of the path planning scheme s, b(s) is the evaluation of the reference method to the path planning scheme s, the strategy of the pi reinforcement learning method, p θ (π | s) represents the probability of outputting the path planning solution s under the parameter θ and the policy π.
S14: training round number e = e +1.
S15: if E > E, the training is ended; otherwise, return to S12.
In one possible implementation, a test set containing 1280 samples is used, for three benchmark methods based on conventional optimization techniques: ant colony algorithm, genetic algorithm and Gurobi, two reference methods based on reinforcement learning: AM and DRL, and the cleaning robot path planning method of the invention, the results are shown in Table 1:
TABLE 1
Method | Optimizing the value of the target | Solution time (unit: second) |
Ant colony algorithm | 7.07 | 261097 |
Genetic algorithm | 8.85 | 175670 |
Gurobi | 7.38 | 129039 |
AM | 7.09 | 0.63 |
DRL | 6.69 | 1.21 |
The invention | 6.59 | 1.27 |
Therefore, from the perspective of an optimization target, the cleaning robot path planning method is superior to the five reference methods; from the perspective of solving time, the cleaning robot path planning method is obviously superior to three methods based on the traditional optimization technology.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details not disclosed in the device embodiments, reference is made to the method embodiments of the invention.
Referring to fig. 4, in a further embodiment of the present invention, a cleaning robot path planning system is provided, which can be used to implement the cleaning robot path planning method described above, and specifically, the cleaning robot path planning system includes a data obtaining module and a model invoking module.
The data acquisition module is used for acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload; the model calling module is used for calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, so as to obtain a path planning result of each cleaning robot.
In one possible embodiment, the deep reinforcement learning model for cleaning robot path planning is constructed by: establishing a mathematical model of a path planning problem of the cleaning robot; establishing a Markov decision process model of the cleaning robot path planning problem according to a mathematical model of the cleaning robot path planning problem; establishing an initial depth reinforcement learning model for path planning of the cleaning robot according to a Markov decision process model of the path planning problem of the cleaning robot; and training an initial deep reinforcement learning model for path planning of the cleaning robot through a preset training set to obtain the deep reinforcement learning model for path planning of the cleaning robot.
In one possible embodiment, the mathematical model of the cleaning robot path planning problem includes optimization variables, optimization objectives, and constraints; the optimization variables comprise a first optimization variable Y and a second optimization variable Z:
Z={z i,j |i∈P,j∈P}
wherein, P is a node set formed by the robot library and the points to be cleanedn is the number of points to be cleaned, p 0 Representing a robot library node; r is a set formed by each cleaning robotk is the number of the cleaning robots,to indicate the variables, indicate whether the cleaning robot r is from p i Go out and arrive at p j If the robot r is from p i Go out and arrive at p j Then, thenOtherwisez i,j Is p i From x i To p j Coordinate x of j Total amount of waste.
The optimization objective is shown as follows:
wherein, c j Is the point p to be cleaned j Cleaning workload of c 0 =0;v r Is the running speed of the cleaning robot r.
The constraint conditions comprise optimization variable value range constraint, area access frequency constraint, robot path continuity constraint, total garbage amount constraint and garbage transportation constraint which can be carried by the robot.
The value range constraint of the optimization variable is shown as the following formula:
z i,j ≥0,i∈P,j∈P
the region access times constraint is as follows:
the robot path continuity constraint is given by:
the total amount of garbage that the robot can carry is constrained as follows:
wherein, b r Is the garbage bin capacity of the cleaning robot r;
the refuse transport constraint is shown by the following formula:
wherein, P' = P- { P 0 P' is a set of n points to be cleaned, g j Is the point p to be cleaned j Amount of garbage of g 0 =0, m is a preset constant.
In one possible embodiment, the markov decision process model of the cleaning robot path planning problem includes environmental states, actions, state transition rules, and costs.
Wherein the environmental state S t As shown in the following formula:
wherein t is the number of steps,to clean the remaining capacity of the robot r waste bin at step t,in order to clean the node where the robot r is located at the t-th step,a set formed by nodes visited by the cleaning robot r from the t step;to the node p at the t step i If node p has access to state i Has been accessed, thenOtherwise
Action A t As shown in the following formula:
A t =(d t ,p t )
wherein d is t For node decoders activated at step t, p t And e P is the node selected in the t step.
State transition rules ST for actions according to t The environmental state is changed from S by the following formula t Transfer to S t+1 :
Wherein r is t Is node decoder d t A corresponding cleaning robot is arranged on the cleaning machine,represents p is t Is spliced atAnd (4) ending.
The cost F is shown as follows:
wherein T is the total number of steps,is the cost of the cleaning robot r at step t,obtained by the following formula:
wherein the content of the first and second substances,to representAndthe distance of (a) to (b),is p t Is determined by the coordinate of (a) in the space,is composed ofThe coordinates of (a).
In one possible embodiment, the deep reinforcement learning model for cleaning robot path planning includes: an encoder and a decoder; the encoder comprises a node encoder and a robot encoder, and the decoder comprises a decoder selector and k node decoders; the output ends of the node encoder and the robot encoder are connected with the input end of the decoder selector, and the output end of the decoder selector is connected with the input ends of the k node decoders; the node encoder comprises a linear mapping layer and L1 graph encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l node Is the index of the graph coding module of the node encoder, when 1 is less than or equal to l node If < L1, the graph coding module L node Output terminal of (1) and node the input ends of +1 image coding modules are connected, when l node If = L1, the pattern coding module L node The output end of the decoder is connected with the input end of the decoder selector; the robot encoder comprises a linear mapping layer and L2 image encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; let l robot For the index of the image coding module of the robot encoder, when 1 is less than or equal to l robot If < L2, the graph coding module L robot Output terminal of (1) and robot the input ends of +1 image coding modules are connected, when l robot If = L2, the pattern coding module L robot The output end of the decoder is connected with the input end of the decoder selector; the decoder selector comprises a multi-head attention layer and a fitness layer, and the output end of the multi-head attention layer is connected with the input end of the fitness layer; the node decoder comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer.
In one possible embodiment, the linear mapping layer is represented by the following formula:
Linear(x)=Wx+B
wherein, the first and the second end of the pipe are connected with each other,is the input of the data to be transmitted,andis a learnable parameter, d in Is the dimension of the data input, d out Is the output dimension of the linear mapping layer.
The fitness layer is represented by the following formula:
wherein softmax () is a normalized exponential function.
The multi-head attention layer is shown as follows:
MHA(X)=Concat(head 1 ,head 2 ,…,head h )W O
wherein the content of the first and second substances,is the input of the multi-head attention layer, nxd x Is the dimension of the input data, concat is the matrix stitchingIn the operation of the method, the operation,is a trainable parameter, h is the number of heads of attention, d v Is the dimension of the value vector, head i Is the output of the ith attention head; head i The calculation method of (a) is as follows:
wherein Q i =XW i Q ,V i =XW i V ;And is a learnable parameter, d k Is the dimension of the key vector.
The graph encoding module is represented by the following formula:
X l+1 =GraphEncoder(X l )
wherein the content of the first and second substances,is the input to the graph coding module and,is the output of the graph coding module and,wherein the content of the first and second substances,is a picture plaitThe code module calculates a process vector, and FF is a forward propagation module and is formed by connecting a plurality of linear mapping layers and a ReLU function layer; BN () is a batch normalization layer.
The ReLU function layer is expressed as follows:
ReLU(x)=max(0,x)
the batch normalization layer is shown as follows:
where γ and β are learnable parameters, E [ x ] is the expectation of x, var [ x ] is the variance of x, and E is a constant to prevent the denominator from being zero.
The input of the node encoder is I P ={(x i ,c i ,g i ) I belongs to P, and the output isWherein the content of the first and second substances,is the code of the ith node; the input of the robot encoder is I R ={(v r ,b r ) I R belongs to R, and the output isWherein the content of the first and second substances,is the code of the ith cleaning robot; the decoder selector has as input at time step tWherein the content of the first and second substances, is cut off to the time step t-1 and the cleaning robot r walksThe path of the beam is a path of the beam,node decoder d with maximum output probability t (ii) a Node decoder d with maximum output probability t (ii) a The node decoder has the input ofWhere r' is the node decoder d t Corresponding cleaning robot, h p Is the code of the node where the cleaning robot is located, h r′ Is the code of the cleaning robot r'; the output is the node p with the maximum probability t 。
In one possible embodiment, the training of the initial deep-reinforcement learning model for cleaning robot path planning optimizes model parameters of the initial deep-reinforcement learning model for cleaning robot path planning by:
where θ is the model parameter, s is the output path planning scheme, F s Is the cost of the path planning scheme s, b(s) is the evaluation of the reference method to the path planning scheme s, the strategy of the pi reinforcement learning method, p θ (π | s) represents the probability of outputting the path planning solution s under the parameter θ and the policy π.
All relevant contents of each step related to the embodiment of the cleaning robot path planning method can be introduced to the functional description of the functional module corresponding to the cleaning robot path planning system in the embodiment of the present invention, and are not described herein again.
The division of the modules in the embodiments of the present invention is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present invention may be integrated in one processor, or may exist alone physically, or two or more modules are integrated in one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is specifically adapted to load and execute one or more instructions in a computer storage medium to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the cleaning robot path planning method.
In yet another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a computer device and is used for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory. One or more instructions stored in the computer-readable storage medium may be loaded and executed by a processor to perform the corresponding steps of the cleaning robot path planning method in the above-described embodiments.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
Claims (10)
1. A method for cleaning robot path planning, comprising:
acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of a robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload;
and calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, and obtaining a path planning result of each cleaning robot.
2. The cleaning robot path planning method according to claim 1, wherein the deep reinforcement learning model for cleaning robot path planning is constructed by:
establishing a mathematical model of a path planning problem of the cleaning robot;
establishing a Markov decision process model of the cleaning robot path planning problem according to a mathematical model of the cleaning robot path planning problem;
establishing an initial depth reinforcement learning model for path planning of the cleaning robot according to a Markov decision process model of the path planning problem of the cleaning robot;
and training an initial deep reinforcement learning model for path planning of the cleaning robot through a preset training set to obtain the deep reinforcement learning model for path planning of the cleaning robot.
3. The cleaning robot path planning method according to claim 2, wherein the mathematical model of the cleaning robot path planning problem includes optimization variables, optimization objectives, and constraints;
the optimization variables comprise a first optimization variable Y and a second optimization variable Z:
Z={z i,j |i∈P,j∈P}
wherein, P is a node set formed by the robot library and the points to be cleanedn is the number of points to be cleaned, p 0 Representing a robot library node; r is a set formed by each cleaning robotk is the number of the cleaning robots,to indicate the variables, indicate whether the cleaning robot r is from p i Go out and arrive at p j If the robot r is from p i Go out and arrive at p j Then, thenOtherwisez i,j Is p i From x i To p j Coordinate x of j Total amount of waste;
the optimization objective is shown as follows:
wherein, c j Is the point p to be cleaned j Cleaning workload of c 0 =0;v r Is the running speed of the cleaning robot r;
the constraint conditions comprise an optimized variable value range constraint, a region access frequency constraint, a robot path continuity constraint, a robot carried garbage total amount constraint and a garbage transportation constraint;
the value range constraint of the optimization variable is shown as the following formula:
z i,j ≥0,i∈P,j∈P
the region access times constraint is as follows:
the robot path continuity constraint is given by:
the total amount of garbage that the robot can carry is constrained as follows:
wherein, b r Is the garbage bin capacity of the cleaning robot r;
the refuse transport constraint is as follows:
wherein, P' = P- { P 0 P' is a set of n points to be cleaned, g j Is the point p to be cleaned j Amount of garbage of g 0 =0, m is a preset constant.
4. The cleaning robot path planning method of claim 3, wherein the Markov decision process model of the cleaning robot path planning problem includes environmental states, actions, state transition rules, and costs;
wherein the environmental state S t As shown in the following formula:
wherein t is the number of steps,to clean the remaining capacity of the robot r waste bin at step t,in order to clean the node where the robot r is located at the t-th step,a set formed by nodes visited by the cleaning robot r from the t step;to be at the t-th nodep i If node p has access to state i Has been accessed, thenOtherwise
Action A t As shown in the following formula:
A t =(d t ,p t )
wherein, d t For node decoders activated at step t, p t E, P is the node selected in the t step;
state transition rules ST for actions according to A t The environmental state is changed from S by the following formula t Transfer to S t+1 :
Wherein r is t Is a node decoder s t A corresponding cleaning robot is arranged on the base plate,represents p is t Is spliced atA terminal end;
the cost F is shown below:
wherein, T is the total number of steps,is the cost of the cleaning robot r at step t,obtained by the following formula:
5. The cleaning robot path planning method according to claim 4, wherein the deep reinforcement learning model for cleaning robot path planning includes: an encoder and a decoder; the encoder comprises a node encoder and a robot encoder, and the decoder comprises a decoder selector and k node decoders; the output ends of the node encoder and the robot encoder are connected with the input end of a decoder selector, and the output end of the decoder selector is connected with the input ends of the k node decoders;
the node encoder comprises a linear mapping layer and L1 graph encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; is provided withFor the index of the graph coding module of the node encoder, whenTime, picture coding moduleTo the output terminal andthe input end of the picture coding module is connected withTime-graph coding moduleThe output end of the decoder is connected with the input end of the decoder selector; the robot encoder comprises a linear mapping layer and L2 image encoding modules; the output end of the linear mapping layer is connected with the input end of the first graph coding module; is provided withFor the indexing of the image coding modules of the robot encoder, whenTime-graph coding moduleOutput terminal and the secondThe input end of the picture coding module is connected withTime-graph coding moduleThe output end of the decoder is connected with the input end of the decoder selector; the decoder selector comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer; the node decoder comprises a multi-head attention layer and a fitness layer, wherein the output end of the multi-head attention layer is connected with the input end of the fitness layer.
6. The cleaning robot path planning method according to claim 5, wherein the linear mapping layer is represented by the following formula:
Linear(x)=Wx+B
wherein, the first and the second end of the pipe are connected with each other,is the input of the data to be transmitted,andis a learnable parameter, d in Is the dimension of the data input, d out Is the output dimension of the linear mapping layer;
the fitness layer is represented by the following formula:
wherein sofmtx () is a normalized exponential function;
the multi-head attention layer is shown as follows:
MHA(X)=Concat(head 1 ,head 2 ,…,head h )W O
wherein the content of the first and second substances,is the input of the multi-head attention layer, nxd x Is the dimension of the input data, concat is the matrix splicing operation,is a trainable parameter, h is the number of attention heads, d v Is the dimension of the value vector, head i Is the output of the ith attention head; head i The calculation method of (a) is as follows:
wherein Q i =XW i Q ,K i =XW i K ,V i =XW i V ;And is a learnable parameter, d k Is the dimension of the key vector;
the graph encoding module is represented by the following formula:
X l+1 =GraphEncoder(X l )
wherein, the first and the second end of the pipe are connected with each other,is the input to the graph coding module and,is the output of the graph encoding module and,wherein, the first and the second end of the pipe are connected with each other,the FF is a forward propagation module and is formed by connecting a plurality of linear mapping layers and a ReLU function layer; BN () is a batch normalization layer;
the ReLU function layer is expressed as follows:
ReLU(x)=max(0,x)
the batch normalization layer is shown below:
where γ and β are learnable parameters, E [ x ] is the expectation of x, var [ x ] is the variance of x, and E is a constant to prevent the denominator from being zero;
the input of the node encoder is I P ={(x i ,c i ,g i ) I belongs to P, and the output isWherein, the first and the second end of the pipe are connected with each other,is the code of the ith node;
the input of the robot encoder is I R ={(v r ,b r ) I R belongs to R, and the output isWherein, the first and the second end of the pipe are connected with each other,is the code of the ith cleaning robot;
the decoder selector has as input at time step tWherein the content of the first and second substances, is the path taken by the cleaning robot r up to time step t-1,node decoder d with maximum output probability t ;
7. The cleaning robot path planning method according to claim 2, wherein the training of the initial deep reinforcement learning model for cleaning robot path planning optimizes model parameters of the initial deep reinforcement learning model for cleaning robot path planning by:
where θ is the model parameter, s is the output path planning scheme, F s Is the cost of the path planning scheme s, b(s) is the evaluation of the reference method to the path planning scheme s, the strategy of the pi reinforcement learning method, p θ (π | s) represents the probability of outputting a path planning solution s under the parameter θ and the policy π.
8. A cleaning robot path planning system, comprising:
the data acquisition module is used for acquiring the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload;
and the model calling module is used for calling a preset deep reinforcement learning model for path planning of the cleaning robot according to the garbage bin capacity and the running speed of each cleaning robot, the coordinates of the robot library, the coordinates of each point to be cleaned, the garbage amount and the cleaning workload, so as to obtain a path planning result of each cleaning robot.
9. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of the method for path planning of a cleaning robot according to any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method for path planning for a cleaning robot according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211147813.4A CN115421494A (en) | 2022-09-19 | 2022-09-19 | Cleaning robot path planning method, system, computer device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211147813.4A CN115421494A (en) | 2022-09-19 | 2022-09-19 | Cleaning robot path planning method, system, computer device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115421494A true CN115421494A (en) | 2022-12-02 |
Family
ID=84204837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211147813.4A Pending CN115421494A (en) | 2022-09-19 | 2022-09-19 | Cleaning robot path planning method, system, computer device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115421494A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115648255A (en) * | 2022-12-15 | 2023-01-31 | 深圳市思傲拓科技有限公司 | Clean path planning management system and method for swimming pool decontamination robot |
-
2022
- 2022-09-19 CN CN202211147813.4A patent/CN115421494A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115648255A (en) * | 2022-12-15 | 2023-01-31 | 深圳市思傲拓科技有限公司 | Clean path planning management system and method for swimming pool decontamination robot |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gobeyn et al. | Evolutionary algorithms for species distribution modelling: A review in the context of machine learning | |
US4697242A (en) | Adaptive computing system capable of learning and discovery | |
Kumar et al. | Genetic algorithm: Review and application | |
CN110222164A (en) | A kind of Question-Answering Model training method, problem sentence processing method, device and storage medium | |
Huizinga et al. | Evolving multimodal robot behavior via many stepping stones with the combinatorial multiobjective evolutionary algorithm | |
Liu et al. | Global maximum likelihood estimation procedure for multinomial probit (MNP) model parameters | |
CN115421494A (en) | Cleaning robot path planning method, system, computer device and storage medium | |
CN107688909A (en) | A kind of automatic yard dispatching method and system based on genetic algorithm | |
Bhar et al. | Era of artificial intelligence: Prospects for IndianAgriculture | |
CN116690589B (en) | Robot U-shaped dismantling line dynamic balance method based on deep reinforcement learning | |
CN110868221A (en) | Multi-mode data automatic compression method | |
Ye et al. | Efficient robotic object search via hiem: Hierarchical policy learning with intrinsic-extrinsic modeling | |
Salehi et al. | Few-shot quality-diversity optimization | |
Gupta et al. | Solving time varying many-objective TSP with dynamic θ-NSGA-III algorithm | |
CN114463596A (en) | Small sample image identification method, device and equipment of hypergraph neural network | |
CN111079888B (en) | Water quality dissolved oxygen prediction method and system based on hybrid QPSO-DE optimization | |
Lee et al. | A genetic algorithm based robust learning credit assignment cerebellar model articulation controller | |
Remya | An adaptive neuro-fuzzy inference system to monitor and manage the soil quality to improve sustainable farming in agriculture | |
CN109492744A (en) | A kind of mixed running optimal control method that discrete binary particle swarm algorithm is coupled with fuzzy control | |
Charansiriphaisan et al. | A comparative study of improved artificial bee colony algorithms applied to multilevel image thresholding | |
CN115293623A (en) | Training method and device for production scheduling model, electronic equipment and medium | |
CN113111729B (en) | Training method, recognition method, system, device and medium for personnel recognition model | |
Li et al. | Evaluation of frameworks that combine evolution and learning to design robots in complex morphological spaces | |
CN112036566A (en) | Method and apparatus for feature selection using genetic algorithm | |
Furze et al. | Mathematical methods to quantify and characterise the primary elements of trophic systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |