CN115235476B - Full-coverage path planning method and device, storage medium and electronic equipment - Google Patents

Full-coverage path planning method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN115235476B
CN115235476B CN202211169283.3A CN202211169283A CN115235476B CN 115235476 B CN115235476 B CN 115235476B CN 202211169283 A CN202211169283 A CN 202211169283A CN 115235476 B CN115235476 B CN 115235476B
Authority
CN
China
Prior art keywords
agent
neural network
network model
grid point
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211169283.3A
Other languages
Chinese (zh)
Other versions
CN115235476A (en
Inventor
娄君杰
郑鑫宇
章航嘉
郑习羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Junsheng Intelligent Automobile Technology Research Institute Co ltd
Original Assignee
Ningbo Junsheng Intelligent Automobile Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Junsheng Intelligent Automobile Technology Research Institute Co ltd filed Critical Ningbo Junsheng Intelligent Automobile Technology Research Institute Co ltd
Priority to CN202211169283.3A priority Critical patent/CN115235476B/en
Publication of CN115235476A publication Critical patent/CN115235476A/en
Application granted granted Critical
Publication of CN115235476B publication Critical patent/CN115235476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Automation & Control Theory (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a full coverage path planning method, a full coverage path planning device, a storage medium and electronic equipment. The invention improves the traditional grid modeling mode, represents discrete environment by grid points, and designs a convolutional neural network model and a state input matrix. Designing a reward and punishment function on the model, and training the convolutional neural network model by using the current mainstream reinforcement learning algorithm; the motion can be output in a continuous motion space, and an optimal full-coverage path is finally formed.

Description

Full-coverage path planning method and device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of intelligent control, in particular to a full-coverage path planning method and device, a storage medium and electronic equipment.
Background
The problems to be solved by full Coverage Path Planning (Complete Coverage Path Planning) include traversing all regions except obstacles in a working region, effectively avoiding all obstacles in the traversing process, avoiding Path repetition as much as possible in the traversing process, and shortening the movement distance.
The traditional full coverage path planning method needs to grid the environment, i.e. divide the task area into a limited number of grids in equal size. The movements of the agent are then discretized into 9 actions of top left, top right, bottom left, bottom right, top, bottom, left, right, and motionless. For most of the motion platform agents, such discrete actions do not satisfy their kinematic constraints, and path planning is inefficient and space-consuming. Secondly, the size of the grid is generally designed according to the size of the intelligent agent, in a large-scale reconnaissance task, the task area is large, the intelligent agent is equivalent to a particle in the task area, the task area is rasterized, a large number of grids are generated, and the calculation difficulty of equipment is increased.
Disclosure of Invention
In order to solve the above problems, the present invention provides a full coverage path planning method based on deep reinforcement learning, which includes:
s10: dividing task area where agent is located into n 1 ×n 2 A plurality of grid points arranged in a matrix;
s20: according to the environment attribute of each grid point in the plurality of grid points at the current moment, respectively assigning values to each grid point to obtain a first environment state matrix for representing the environment state of the task area;
s30: according to the distance between the intelligent agent and each grid point at the current moment, respectively assigning values to each grid point to obtain a first position state matrix for representing the position state of the intelligent agent;
s40: according to the distances between the intelligent agent and each grid point at the N previous moments, respectively assigning values to each grid point to obtain N heading information matrixes for representing heading information of the intelligent agent;
s50: splicing the first environment state matrix, the first position state matrix and the N heading information matrices into N +2 state input matrices;
s60: constructing a convolutional neural network model, inputting the N +2 state input matrixes into the convolutional neural network model, so that the convolutional neural network model outputs according to the N +2 state input matrixes, and outputting an output value representing the next-step execution information of the agent;
s70: training the convolutional neural network model by adopting a deep reinforcement learning algorithm;
s80: adopting a trained convolutional neural network model to plan a path of the agent;
n previous moments are adjacent to the current moment and occur at moments before the current moment, and N is greater than or equal to 2; n is 1 Is an integer from 1 to 1000; n is 2 Is 1 to 1000 unitsAnd (4) counting.
The benefit of this scheme of adoption lies in: firstly, compared with the traditional rasterization modeling mode which uses discrete behavior actions, the scheme can output actions in a continuous action space and finally form an optimal full coverage path; the second step is as follows: the calculated amount is reduced, the convolutional neural network model is used in the scheme, the input state characteristics can be extracted, and the mode is smaller than the image matrix data amount; and thirdly: the designed high-dimensional state input has more obvious characteristics and can represent richer environment and intelligent state information; and the fourth step: the resource utilization rate is increased, and the model convolutional neural network model is trained by using a reinforcement learning algorithm and does not depend on an original data set; and fifthly, the universality is higher, by using the path planning method provided by the scheme, the initial position of the intelligent agent can be any position in the task area, and the area outside the task can use the same set of neural network model.
Further, the element m (i, j) in the first environment state matrix is any one of [ -1,0,1], and the element m (i, j) is assigned according to the following principle:
when the environment attribute is that the grid point is an obstacle, m (i, j) = -1;
when the environment attribute is that a grid point has been detected, m (i, j) =0;
when the environment attribute is that the grid point is not detected, m (i, j) =1.
The benefit of this scheme of adoption lies in: and assigning values according to the environment attributes, digitizing the current environment state to obtain a first environment state matrix, which can represent the environment type and the environment coverage condition of the current area.
Further, the elements dis in the first position state matrix i,j Assigned according to the following principle:
Figure 662346DEST_PATH_IMAGE001
wherein dis i,j Is the Euclidean distance, X, between the agent and the grid point of the ith row and the jth column of the first position state matrix agent For agents in a two-dimensional plane corresponding to the task areaX-coordinate, Y-coordinate in rectangular coordinate system agent For the Y coordinate, X coordinate of the agent in a two-dimensional rectangular plane coordinate system i,j Is a Yuan dis i,j X-coordinate, Y-coordinate in a two-dimensional planar rectangular coordinate system i,j Is a Yuan dis i,j Y-coordinate, dis, in a two-dimensional plane rectangular coordinate system max The longest distance in the task area.
Further, the method further comprises: and (3) limiting the output value within the range of [ -1,1] by using the tanh activation function at the output layer of the convolutional neural network model, and multiplying the limited output value by the maximum steering limit of the intelligent agent to obtain a steering action output value representing the steering action of the intelligent agent.
The benefit of this scheme of adoption lies in: output values are standardized, and subsequent calculation and action execution of the intelligent agent are facilitated.
Further, training the convolutional neural network model by adopting a deep reinforcement learning algorithm, comprising: constructing a reward and punishment function according to the detection process of the agent in the task area; and training the convolutional neural network model by adopting a depth reinforcement learning algorithm based on the reward and punishment function.
The benefit of this scheme of adoption lies in: the learned convolutional neural network model can effectively eliminate adverse effects caused by various factors such as noise and the like, so that the convolutional neural network model is more suitable for the actual situation in path planning, and is reasonable and effective.
Further, the rewarding and punishing function is constructed in the following manner: r = r dot +r full +r fail +r close
Where r is a reward or punishment function, r dot An average distance difference between the agent and an undetected point in the plurality of grid points at a current time and at a next subsequent time relative to the current time; the agent moves towards an undetected point, r dot For awards, the agent is not moving towards undetected points, r dot Is punishment; r is a radical of hydrogen full Reward for the agent to complete a full coverage task; r is fail Punishment for collision of the intelligent body with an obstacle or driving away from a task area; r is a radical of hydrogen close For the distance of the agent from the obstacle or from the task areaIs less than the penalty of the target distance.
The benefit of this scheme of adoption lies in: the reward and punishment function provided by the scheme can avoid sparse reward, has a corresponding reward and punishment value every time one step is executed, and accelerates model training; the generalization of the trained convolutional neural network model is higher.
Further, the method further comprises:
assigning values to each grid point according to the environment attribute of each grid point in the plurality of grid points at the next moment to obtain a second environment state matrix for representing the environment state of the task area;
assigning values to each grid point according to the distance between the intelligent agent and each grid point at the next moment to obtain a second position state matrix for representing the position state of the intelligent agent;
the next moment is a moment adjacent to the current moment and occurring after the current moment;
mean distance difference r dot Obtained by the following formula:
Figure 100002_DEST_PATH_IMAGE002
wherein S is cur Is a first environmental state matrix and is greater than 0,dis cur Is a first position state matrix, S next Is a second ambient state matrix and is greater than 0,dis next And n is the number of undetected points in the second position state matrix.
The invention also provides a full-coverage path planning device based on deep reinforcement learning, which comprises:
the system comprises a first determining module, a second determining module and a control module, wherein the first determining module is used for determining a plurality of grid points according to a task area where an agent is located;
the second determining module is used for determining a first environment state matrix according to the environment attribute of each grid point in the plurality of grid points;
the third determining module is used for determining a first position state matrix according to the distance between the intelligent agent and each grid point at the current moment;
the fourth determining module is used for determining N heading information matrixes according to the distance between the intelligent agent and each grid point at N previous moments; the N previous moments are moments which are adjacent to the current moment and occur before the current moment;
the building module is used for building a convolutional neural network model, splicing the first environment state matrix, the first position state matrix and the N heading information matrices into N +2 state input matrices to be input into the convolutional neural network model, and outputting an output value representing the next-step execution information of the intelligent agent;
the training module is used for training the convolutional neural network model according to a deep reinforcement learning algorithm;
and the planning module is used for planning the path of the intelligent agent according to the trained convolutional neural network model.
The invention also provides an electronic device comprising a processor, a memory and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the method as in any of the above aspects.
The invention also provides a readable storage medium on which is stored a program or instructions which, when executed by a processor, performs a method as in any of the above.
Drawings
Fig. 1 is a flowchart of a full coverage path planning method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of grid points provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a first environment state matrix according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a first position state matrix according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1 to 4, the present embodiment provides a full coverage path planning method based on deep reinforcement learning, including:
s10: dividing task area where agent is located into n 1 ×n 2 A plurality of grid points arranged in a matrix;
s20: according to the environment attribute of each grid point in the plurality of grid points at the current moment, respectively assigning values to each grid point to obtain a first environment state matrix for representing the environment state of the task area;
s30: according to the distance between the intelligent agent and each grid point at the current moment, respectively assigning values to each grid point to obtain a first position state matrix for representing the position state of the intelligent agent;
s40: according to the distances between the intelligent agent and each grid point at the N previous moments, respectively assigning values to each grid point to obtain N heading information matrixes for representing heading information of the intelligent agent;
s50: splicing the first environment state matrix, the first position state matrix and the N heading information matrices into N +2 state input matrices;
s60: constructing a convolutional neural network model, inputting the N +2 state input matrixes into the convolutional neural network model, so that the convolutional neural network model outputs according to the N +2 state input matrixes, and outputting an output value representing the next-step execution information of the agent;
s70: training the convolutional neural network model by adopting a deep reinforcement learning algorithm;
s80: adopting a trained convolutional neural network model to plan a path of the agent;
wherein the N previous moments are moments adjacent to the current moment and occurring before the current momentN is greater than or equal to 2; n is 1 Is an integer from 1 to 1000; n is a radical of an alkyl radical 2 Is an integer of 1 to 1000.
In view of the prior art, the full coverage path planning needs to grid the environment, so that a large number of grids are generated, and the calculation difficulty of the device is increased.
Aiming at the problem, the embodiment provides a new path planning method; and changing the grid blocks into points by using a method similar to grid graph modeling, and assigning values to the grid blocks according to the environment attributes to obtain a first environment state matrix which can represent the environment type and the coverage condition of the environment. And further acquiring the distances between the intelligent body and the grid points when the current time and the N previous times are obtained, and respectively obtaining a first position state matrix and N heading information matrices. And splicing the first environment state matrix, the first position state matrix and the N heading information matrixes into N +2 state input matrixes which are used as input values of the convolutional neural network model. The constructed convolutional neural network model is trained and then used for path planning of the intelligent agent, wherein the path planning includes but is not limited to determination of speed, steering angle and the like of the intelligent agent. Further, a deep reinforcement learning algorithm is used for training the convolutional neural network model; the deep reinforcement learning algorithm includes, for example, DDPG, SAC, and the like. The trained convolutional neural network model can be used for full coverage path planning of the agent.
The benefit of this scheme of adoption lies in: firstly, compared with the traditional rasterization modeling mode which uses discrete behavior actions, the scheme can output actions in a continuous action space and finally form an optimal full coverage path; and the second step is as follows: the calculated amount is reduced, the convolutional neural network model is used in the scheme, the input state characteristics can be extracted, and the mode is smaller than the image matrix data amount; and thirdly: the designed high-dimensional state input has more obvious characteristics and can represent richer environment and intelligent state information; fourthly, the method comprises the following steps: the resource utilization rate is increased, and the model convolutional neural network model is trained by using a reinforcement learning algorithm and does not depend on an original data set; and fifthly, the universality is higher, by using the path planning method provided by the scheme, the initial position of the intelligent agent can be any position in the task area, and the area outside the task can use the same set of neural network model.
Further, the element m (i, j) in the first environment state matrix is any one of [ -1,0,1], and the element m (i, j) is assigned according to the following principle:
when the environment attribute is that the grid point is an obstacle, m (i, j) = -1;
when the environment attribute is that a grid point has been detected, m (i, j) =0;
when the environment attribute is that the grid point is not detected, m (i, j) =1.
In the embodiment, a modeling mode of a traditional grid graph is abandoned, and a grid point is assigned, specifically, when the grid point is an obstacle, the assignment is-1; the grid point is a passable area, namely the grid point is assigned as 1 when not detected; the grid point is the covered traffic area, i.e. when it has been detected, the value is assigned to 0. Thus, a first environment state matrix with n × n dimensions is formed, and the environment type and the environment coverage condition of the current area can be represented.
Further, the elements dis in the first position state matrix i,j Assigned according to the following principle:
Figure 629033DEST_PATH_IMAGE001
wherein dis i,j Is the Euclidean distance, X, between the agent and the grid point of the ith row and the jth column of the first position state matrix agent For the X coordinate, Y coordinate of the agent in a two-dimensional rectangular plane coordinate system corresponding to the task area agent For the Y coordinate, X coordinate of the intelligent body in a two-dimensional plane rectangular coordinate system i,j Is a Yuan dis i,j X-coordinate, Y-coordinate in a two-dimensional planar rectangular coordinate system i,j Is Yuan Di i,j Y-coordinate, dis, in a two-dimensional plane rectangular coordinate system max The longest distance in the task area.
Further, the method further comprises: and (3) limiting the output value within the range of [ -1,1] by using the tanh activation function at the output layer of the convolutional neural network model, and multiplying the limited output value by the maximum steering limit of the intelligent agent to obtain a steering action output value representing the steering action of the intelligent agent.
In the present embodiment, the output value is limited to the range of [ -1,1] using the tanh activation function. the convergence speed of the tanh activation function is high, and the iteration times are few; output values are standardized, and subsequent calculation and action execution of the intelligent agent are facilitated.
Further, training the convolutional neural network model by adopting a deep reinforcement learning algorithm, comprising: constructing a reward and punishment function according to the detection process of the agent in the task area; and training the convolutional neural network model by adopting a depth reinforcement learning algorithm based on the reward and punishment function.
In the embodiment, the convolutional neural network model is trained by using a reinforcement learning algorithm, and does not depend on an original data set; the learned convolutional neural network model can effectively eliminate adverse effects caused by various factors such as noise and the like, so that the convolutional neural network model is more suitable for the actual situation in path planning, and is reasonable and effective.
Further, the reward and punishment function is constructed in the following manner: r = r dot +r full +r fail +r close
Where r is a reward or punishment function, r dot An average distance difference between the agent and an undetected point in the plurality of grid points at a current time and at a next subsequent time relative to the current time; the agent moves towards the undetected point, r dot For awards, the agent is not moving towards undetected points, r dot Is punishment; r is full Awarding for completing a full coverage task for the agent; r is fail Punishment for collision of the intelligent body with an obstacle or driving away from a task area; r is close Penalty for the agent being less than the target distance from the obstacle or from the boundary of the task area.
In the related art, sparse reward is mostly adopted, only scoring or not scoring is carried out, and no feedback is provided in the action execution process, so that the training result is poor. For this situation, the present embodiment constructs a reward and punishment function, which includes four parameters; the objectivity of the data can be fully respected, and the actions of the intelligent agent are comprehensively investigated; and each step of execution has a corresponding reward and punishment value, so that the model training is accelerated.
In the present embodiment, r dot The method is used for evaluating the movement trend of the intelligent agent, and when the intelligent agent moves towards an undetected point, the intelligent agent is rewarded, and otherwise, the intelligent agent is punished. r is full Is a reward for outcome that is assigned when the agent completes the full coverage task. r is fail Is a penalty for agent errors, such as hitting an obstacle or driving out of a task area. r is close The method can inspect the motion behavior degree of the intelligent body, set the target distance and take punishment when the distance between the intelligent body and the obstacle is smaller than the target distance. Through the setting of the four parameters, the convolutional neural network model can be further optimized, and more appropriate path planning can be performed on the intelligent agent.
Further, the method further comprises:
respectively assigning values to each grid point according to the environment attribute of each grid point in the plurality of grid points at the next moment to obtain a second environment state matrix for representing the environment state of the task area;
assigning values to each grid point according to the distance between the intelligent agent and each grid point at the next moment to obtain a second position state matrix for representing the position state of the intelligent agent;
the next moment is a moment which is adjacent to the current moment and occurs after the current moment;
mean distance difference r dot Obtained by the following formula:
Figure 758663DEST_PATH_IMAGE002
wherein S is cur Is a first environmental state matrix and is greater than 0,dis cur Is a first position state matrix, S next Is a second ambient state matrix and is greater than 0,dis next And n is the number of undetected points in the second position state matrix.
In the present embodiment, r dot The intelligent agent motion trend calculation method is used for calculating the motion trend of the intelligent agent and judging from the sum of two dimensions of time and distance; specifically, the movement trend of the agent is judged according to the distance between the agent and each grid point at the current moment and the distance between the agent and each grid point at the next moment and the change between the two distances. In one embodiment, the distance is a euclidean distance.
Example 2
The embodiment provides a full coverage path planning device based on deep reinforcement learning, which comprises:
the first determination module is used for determining a plurality of grid points according to the task area where the agent is located;
the second determining module is used for determining the first environment state matrix according to the environment attribute of each grid point in the plurality of grid points;
the third determining module is used for determining a first position state matrix according to the distance between the intelligent agent and each grid point at the current moment;
a fourth determining module, configured to determine N heading information matrices according to distances between the intelligent agent and each grid point at N previous moments; the N previous moments are moments which are adjacent to the current moment and occur before the current moment;
the building module is used for building a convolutional neural network model, splicing the first environment state matrix, the first position state matrix and the N heading information matrices into N +2 state input matrices to be input into the convolutional neural network model, and outputting an output value representing the next-step execution information of the intelligent agent;
the training module is used for training the convolutional neural network model according to a deep reinforcement learning algorithm;
and the planning module is used for planning the path of the intelligent agent according to the trained convolutional neural network model.
Example 3
The present embodiment provides an electronic device, which includes a processor, a memory, and a program or an instruction stored in the memory and executable on the processor, wherein the program or the instruction implements the steps of the method of the above embodiment when executed by the processor.
Example 4
The present embodiment provides a readable storage medium on which a program or instructions are stored, which when executed by a processor implement the steps of the method of the above embodiment.
The processor is the processor in the electronic device in the above embodiment. Readable storage media, including computer-readable storage media, such as computer Read-Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, etc.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A full coverage path planning method based on deep reinforcement learning is characterized by comprising the following steps:
dividing task area where intelligent agent is located into n 1 ×n 2 A plurality of grid points arranged in a matrix;
according to the environment attribute of each grid point in the plurality of grid points at the current moment, respectively assigning values to each grid point to obtain a first environment state matrix used for representing the environment state of the task area;
according to the distance between the intelligent agent and each grid point at the current moment, respectively assigning values to each grid point to obtain a first position state matrix for representing the position state of the intelligent agent;
according to the distances between the intelligent agent and each grid point at N previous moments, respectively assigning values to each grid point to obtain N heading information matrixes for representing heading information of the intelligent agent;
splicing the first environment state matrix, the first position state matrix and the N heading information matrixes into N +2 state input matrixes;
constructing a convolutional neural network model, and inputting the N +2 state input matrixes into the convolutional neural network model, so that the convolutional neural network model outputs an output value representing the next-step execution information of the agent according to the N +2 state input matrixes;
training the convolutional neural network model by adopting a deep reinforcement learning algorithm;
adopting the trained convolutional neural network model to plan a path of the agent;
n previous moments are adjacent to the current moment and occur at moments before the current moment, and N is greater than or equal to 2; n is 1 Is an integer from 1 to 1000; n is a radical of an alkyl radical 2 Is an integer from 1 to 1000;
the element m (i, j) in the first environment state matrix is any one of [ -1,0,1], which is assigned according to the following principle:
when the grid point is an obstacle, m (i, j) = -1;
the environment attribute is that when a grid point has been detected, m (i, j) =0;
when the environment attribute is that the grid point is not detected, m (i, j) =1;
meta dis in the first position state matrix i,j Assigned according to the following principle:
Figure 42080DEST_PATH_IMAGE001
wherein dis i,j Is the Euclidean distance, X, between the agent and the grid point of the ith row and the jth column of the first position state matrix agent For the X coordinate and the Y coordinate of the intelligent agent in a two-dimensional plane rectangular coordinate system corresponding to the task area agent Is the Y coordinate and the X coordinate of the intelligent body in the two-dimensional plane rectangular coordinate system i,j Is said meta dis i,j X-coordinate, Y-coordinate in said two-dimensional rectangular plane coordinate system i,j Is said meta dis i,j Y-coordinate, dis, in said two-dimensional plane rectangular coordinate system max Is the longest distance in the task area.
2. The method of claim 1, further comprising:
and limiting the output value within the range of [ -1,1] by using a tanh activation function at an output layer of the convolutional neural network model, and multiplying the limited output value by the maximum steering limit of the intelligent agent to obtain a steering action output value representing the steering action of the intelligent agent.
3. The method according to claim 1 or 2, wherein the training the convolutional neural network model by using a deep reinforcement learning algorithm comprises:
constructing a reward and punishment function according to the detection process of the agent in the task area;
and training the convolution neural network model by adopting a depth reinforcement learning algorithm based on the reward and punishment function.
4. The method of claim 3, wherein the reward function is constructed by:
r=r dot +r full +r fail +r close
wherein r is the rewarding function,
r dot an average distance difference between the agent and an undetected point in the plurality of grid points at the current time and at a next time relative to the current time; the next moment is adjacent to the current moment, anda time occurring after the current time;
the agent moves towards the undetected point, r dot For awards, the agent is not moving towards the undetected point, r dot Is punishment;
r full reward for the agent completing a full coverage task;
r fail punishment of collision of the intelligent body with an obstacle or driving away from a task area is given;
r close and punishment that the distance between the intelligent agent and the obstacle or the distance between the intelligent agent and the boundary of the task area is smaller than the target distance.
5. The method of claim 4, further comprising:
respectively assigning values to each grid point according to the environment attribute of each grid point in the plurality of grid points at the next moment to obtain a second environment state matrix for representing the environment state of the task area;
assigning values to each grid point according to the distance between the intelligent agent and each grid point at the next moment to obtain a second position state matrix for representing the position state of the intelligent agent;
wherein the average distance difference r dot Obtained by the following formula:
Figure DEST_PATH_IMAGE002
wherein S is cur Is the first environmental state matrix and is greater than 0,dis cur Is the first position state matrix, S next Is the second ambient state matrix and is greater than 0,dis next N is the number of undetected points for the second position state matrix.
6. A full coverage path planning device based on deep reinforcement learning is characterized by comprising:
the first determination module is used for determining a plurality of grid points according to the task area where the agent is located;
a second determining module, configured to determine a first environment state matrix according to an environment attribute of each of the plurality of grid points;
a third determining module, configured to determine a first position state matrix according to distances between the agent and each grid point at the current time;
a fourth determining module, configured to determine N heading information matrices according to distances between the agent and each grid point at N previous times, respectively; the N previous moments are moments which are adjacent to the current moment and occur before the current moment;
the building module is used for building a convolutional neural network model, splicing the first environment state matrix, the first position state matrix and the N heading information matrices into N +2 state input matrices, inputting the state input matrices into the convolutional neural network model, and outputting an output value representing the next step execution information of the intelligent agent;
the training module is used for training the convolutional neural network model according to a deep reinforcement learning algorithm;
and the planning module is used for planning the path of the intelligent agent according to the trained convolutional neural network model.
7. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, which when executed by the processor, implements the full coverage path planning method of any of claims 1 to 5.
8. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement a full coverage path planning method according to any one of claims 1 to 5.
CN202211169283.3A 2022-09-26 2022-09-26 Full-coverage path planning method and device, storage medium and electronic equipment Active CN115235476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211169283.3A CN115235476B (en) 2022-09-26 2022-09-26 Full-coverage path planning method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211169283.3A CN115235476B (en) 2022-09-26 2022-09-26 Full-coverage path planning method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN115235476A CN115235476A (en) 2022-10-25
CN115235476B true CN115235476B (en) 2023-01-17

Family

ID=83667276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211169283.3A Active CN115235476B (en) 2022-09-26 2022-09-26 Full-coverage path planning method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115235476B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110977967A (en) * 2019-11-29 2020-04-10 天津博诺智创机器人技术有限公司 Robot path planning method based on deep reinforcement learning
CN111290398A (en) * 2020-03-13 2020-06-16 东南大学 Unmanned ship path planning method based on biological heuristic neural network and reinforcement learning
CN113110509A (en) * 2021-05-17 2021-07-13 哈尔滨工业大学(深圳) Warehousing system multi-robot path planning method based on deep reinforcement learning
CN113390412A (en) * 2020-03-11 2021-09-14 宁波方太厨具有限公司 Full-coverage path planning method and system for robot, electronic equipment and medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109540151B (en) * 2018-03-25 2020-01-17 哈尔滨工程大学 AUV three-dimensional path planning method based on reinforcement learning
US11074480B2 (en) * 2019-01-31 2021-07-27 StradVision, Inc. Learning method and learning device for supporting reinforcement learning by using human driving data as training data to thereby perform personalized path planning
CN109813328B (en) * 2019-02-22 2021-04-30 百度在线网络技术(北京)有限公司 Driving path planning method and device and vehicle
US20210103286A1 (en) * 2019-10-04 2021-04-08 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Systems and methods for adaptive path planning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110977967A (en) * 2019-11-29 2020-04-10 天津博诺智创机器人技术有限公司 Robot path planning method based on deep reinforcement learning
CN113390412A (en) * 2020-03-11 2021-09-14 宁波方太厨具有限公司 Full-coverage path planning method and system for robot, electronic equipment and medium
CN111290398A (en) * 2020-03-13 2020-06-16 东南大学 Unmanned ship path planning method based on biological heuristic neural network and reinforcement learning
CN113110509A (en) * 2021-05-17 2021-07-13 哈尔滨工业大学(深圳) Warehousing system multi-robot path planning method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于改进神经网络的多AUV全覆盖路径规划;朱大奇等;《系统仿真学报》;20200831(第08期);全文 *
强化学习的地空异构多智能体协作覆盖研究;张文旭等;《智能系统学报》;20170630(第02期);全文 *

Also Published As

Publication number Publication date
CN115235476A (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN112325897B (en) Path planning method based on heuristic deep reinforcement learning
CN113110509B (en) Warehousing system multi-robot path planning method based on deep reinforcement learning
Jin et al. A framework for evolutionary optimization with approximate fitness functions
Buniyamin et al. Robot global path planning overview and a variation of ant colony system algorithm
CN112356830A (en) Intelligent parking method based on model reinforcement learning
CN112433525A (en) Mobile robot navigation method based on simulation learning and deep reinforcement learning
CN109726676B (en) Planning method for automatic driving system
CN105045260A (en) Mobile robot path planning method in unknown dynamic environment
CN110243373B (en) Path planning method, device and system for dynamic storage automatic guided vehicle
CN104317297A (en) Robot obstacle avoidance method under unknown environment
CN113879339A (en) Decision planning method for automatic driving, electronic device and computer storage medium
CN115235476B (en) Full-coverage path planning method and device, storage medium and electronic equipment
CN114036631A (en) Spacecraft autonomous rendezvous and docking guidance strategy generation method based on reinforcement learning
Klimesch et al. Simulating liquids with graph networks
CN117471919A (en) Robot path planning method based on improved pelican optimization algorithm
US20230162539A1 (en) Driving decision-making method and apparatus and chip
CN111240318A (en) Robot personnel discovery algorithm
US20240202393A1 (en) Motion planning
US20220198225A1 (en) Method and system for determining action of device for given state using model trained based on risk-measure parameter
CN113627646B (en) Path planning method, device, equipment and medium based on neural network
Huang et al. Simulation of pedestrian evacuation with reinforcement learning based on a dynamic scanning algorithm
CN115562258A (en) Robot social self-adaptive path planning method and system based on neural network
Gross et al. Probabilistic model checking of stochastic reinforcement learning policies
Ha et al. Vehicle control with prediction model based Monte-Carlo tree search
Yin et al. Random Network Distillation Based Deep Reinforcement Learning for AGV Path Planning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant