CN116382304B - DQN model-based multi-inspection robot collaborative path planning method and system - Google Patents

DQN model-based multi-inspection robot collaborative path planning method and system Download PDF

Info

Publication number
CN116382304B
CN116382304B CN202310604238.4A CN202310604238A CN116382304B CN 116382304 B CN116382304 B CN 116382304B CN 202310604238 A CN202310604238 A CN 202310604238A CN 116382304 B CN116382304 B CN 116382304B
Authority
CN
China
Prior art keywords
state
inspection
robots
action
inspection robot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310604238.4A
Other languages
Chinese (zh)
Other versions
CN116382304A (en
Inventor
陈昊
方国权
钱其隆
戚满顺
蔡彪
张海华
韩祥政
张锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202310604238.4A priority Critical patent/CN116382304B/en
Publication of CN116382304A publication Critical patent/CN116382304A/en
Application granted granted Critical
Publication of CN116382304B publication Critical patent/CN116382304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0219Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory ensuring the processing of the whole working surface
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Manipulator (AREA)

Abstract

The method and the system for planning the cooperative paths of the multiple inspection robots based on the DQN model acquire the position coordinates of all the inspection robots and the arrival states of all the task points to construct a cooperative state space of the multiple inspection robots; acquiring the selected moving directions of all the inspection robots to construct a coordinated action space of the multiple inspection robots; classifying states by taking a trigger anti-collision mechanism between the inspection robots and the obstacle as a constraint condition, and defining rewarding values corresponding to various states; according to the state, the action and the rewarding value, the DQN model calculates the expectation of the rewarding value obtained after the set action is executed in the set state, optimizes the expectation through deep neural network parameter training, and forms a multi-patrol robot cooperative path according to the state and the action corresponding to the maximum expectation. The application can avoid obstacles, and the anti-collision mechanism is not triggered among the multiple inspection robots, thereby reducing energy loss and improving the collaborative inspection efficiency of the multiple robots.

Description

DQN model-based multi-inspection robot collaborative path planning method and system
Technical Field
The application belongs to the technical field of substation inspection, and particularly relates to a multi-inspection robot collaborative path planning method and system based on an DQN model.
Background
The inspection work is the basis for guaranteeing the safe operation of the transformer substation. The traditional manual inspection has the defects of high labor intensity, low working efficiency and difficult guarantee of inspection quality; under extreme weather conditions such as thunderstorms and typhoons, safety risks exist.
The inspection robot realizes the functions of state inspection, infrared temperature measurement, partial discharge detection, data transmission and the like of primary and secondary equipment in a station by means of modern information communication technology, artificial intelligence technology and high-performance computing technology. Currently, the inspection robot is widely applied to actual inspection work of a transformer substation.
When the outdoor site of the transformer substation is patrolled and examined, the patrol task is completed through the cooperation of the multi-patrol robot, so that the utilization rate of patrol resources in the station can be further improved, the patrol time is shortened, and the patrol efficiency is improved. However, in the prior art, some constraint factors still exist in realizing the cooperation of multiple inspection robots, and most importantly, the moving paths of the inspection robots of different manufacturers are generally based on templates preset by the manufacturers, the paths of the inspection robots are relatively fixed, and if the inspection robots are directly applied to a scene of the cooperation of the multiple robots, the problems of repeated tasks, repeated paths, long time consumption and the like can occur.
In the method for planning the patrol path of the patrol robot in the transformer substation in the prior art, the optimal path search of a single patrol robot is realized based on an optimization method combining an ant colony optimization algorithm and an artificial potential field algorithm. The optimization method based on the improved ant colony-simulated annealing algorithm solves the problems that the path planning is slow in convergence speed and easy to sink into local optimum in a complex working environment. But all regard as the main part with single inspection robot, and do not consider the application scenario that the task of patrolling and examining was accomplished through many inspection robots in coordination, it is limited to the efficiency promotion effect of patrolling and examining of intelligent inspection robot in the transformer substation. Based on the anti-collision mechanism triggered by the laser and visual navigation technology, the inspection robot can trigger the anti-collision mechanism through the laser and visual navigation technology in the practical application environment so as to avoid collision with the obstacle, but extra energy and time loss are caused, and the inspection efficiency of the intelligent inspection robot in the transformer substation is improved only by considering the constraint that the inspection robot can avoid the obstacle area on the path.
Disclosure of Invention
In order to solve the defects in the prior art, the application provides the multi-patrol robot collaborative path planning method and system based on the DQN model, which are used for planning the path of the patrol robots collaborative patrol, so that not only can the obstacle area be avoided on the path, but also the efficiency of the multi-robot collaborative patrol of the transformer substation can be improved, and the anti-collision mechanism can not be triggered between the multi-patrol robots, thereby reducing the energy consumption loss.
The application adopts the following technical scheme.
A multi-inspection robot collaborative path planning method based on an DQN model, wherein each inspection robot has defined respective corresponding task points and traversal sequences, comprises the following steps:
step 1, acquiring position coordinates of all inspection robots and arrival states of all task points, and constructing a coordinated state space of multiple inspection robots;
step 2, obtaining the moving directions selected by all the inspection robots, and constructing a coordinated action space of a plurality of inspection robots;
step 3, classifying states by taking a trigger anti-collision mechanism between the inspection robots and the obstacle and a trigger anti-collision mechanism between the inspection robots as constraint conditions, and defining rewarding values corresponding to various states;
step 4, calculating expectations of a return value obtained after the set action is executed in the set state by the DQN model according to the cooperative state and action of the multi-inspection robot and the reward value corresponding to each state;
and 5, optimizing the expectations of the return values by the DQN model through parameter training of the deep neural network, and forming a multi-inspection robot cooperative path according to the state and the action corresponding to the maximum expectations.
State spaceThe following are provided:
(1)
in the method, in the process of the application,
characterization of->The arrival status of the individual task points, when->Time indicates +.>The task points have not been reached by the corresponding inspection robot, < >>Time then indicates->The task points are reached by the corresponding inspection robots;
characterization of->Position coordinates of the individual inspection robots;
,/>the total number of the inspection robots;
,/>is the total number of task points.
The position coordinates of the inspection robot are as follows:
(2)
in the method, in the process of the application,
and->Respectively represent +.>The abscissa and the ordinate of each inspection robot in the grid-shaped map;
and->Representing the total length and the total width of the grid-like map generated based on the planar arrangement of the electrical devices within the substation, respectively.
Action spaceThe method comprises the following steps:
(3)
in the method, in the process of the application,
indicate->The moving direction selected by the inspection robots or staying at the original position, wherein the moving direction comprises the following steps: north N, northeast NE, eastern E, north-south SE, south S, southwest SW, west W, and northwest NW, each inspection robot moves one unit in a selected direction of movement.
The states include a free state, a semi-successful state, a failed state, and a successful state; the method comprises the following steps:
1) Free State (FS): all the inspection robots do not trigger the respective anti-collision mechanism, and all the inspection robots do not reach the first task point which is needed to be reached respectively;
2) Semi-Successful State (SS): part of task points are reached by the corresponding inspection robots, but the task points which are not reached by the corresponding inspection robots still exist in the environment or part of the inspection robots still exist and are not returned to the charging room;
3) Failure State (DS): triggering an anti-collision mechanism between the inspection robot and the obstacle or triggering an anti-collision mechanism between the inspection robots;
4) Success State (CS): all task points have been reached by their corresponding inspection robots and all inspection robots have been returned to the charging room.
Prize values corresponding in four classes of statesConstructing a reward and punishment function as follows:
(4)
in the method, in the process of the application,is->The number of task points reached by the individual inspection robots, < >>Is in a state.
Expectations of return valuesThe following are provided:
(5)
in the method, in the process of the application,
indicating time->The lower return function is provided with a return function,
indicating time->The state space in the lower part of the system,
indicating time->The lower part of the action space is provided with a plurality of grooves,
representing the desired function.
The reward function is the sum of reward values of subsequent states after the setting action is executed in the setting state, and the following relational expression is satisfied:
(6)
in the method, in the process of the application,
indicating time->The lower return function is provided with a return function,
indicating time->The lower prize value reflects the prize value obtained by entering the subsequent state after the setting action is executed in the setting state,
indicating time->Lower->The prize value corresponding to each task point reflects the prize value obtained by entering the subsequent state after executing the setting action in the setting state,
indicate->Discount factors corresponding to each task point, +.>
For discounts factor->
The DQN model builds an estimation network and a target network with consistent structures, and corresponding network parametersAnd->And (5) performing parameter optimization.
Estimating that the inputs of the network and the target network are both state-action pairsWherein the output of the estimation network isFor estimating the state-action pair at the present moment>The corresponding Q value; the target network output isThe method is used for storing the optimal Q value in the parameter training process, wherein the optimal Q value is the maximum Q value in the target network.
The estimation network takes the optimal Q value stored by the target network as a learning target pair parameterUpdate and use the updated parameters +.>And participating in calculation of an estimated network loss function, wherein the estimated network loss function is as follows:
(7)
in the method, in the process of the application,
in order to estimate the network loss function,
in order to optimize the goal of the present application,
and taking the Q value corresponding to the maximum value of the optimization target as an optimal Q value.
The application also provides a multi-inspection robot collaborative path planning system, wherein each inspection robot has defined the corresponding task point and the traversing sequence, and the system comprises:
the system comprises a state module, an action module, a state action pair evaluation module and a collaborative planning module;
the state module is used for acquiring the position coordinates of all the inspection robots and the arrival state of each task point and constructing a coordinated state space of the plurality of inspection robots;
the action module is used for acquiring the moving directions selected by all the inspection robots and constructing a coordinated action space of the plurality of inspection robots;
the state action pair evaluation module is used for classifying the states by taking a trigger anti-collision mechanism between the inspection robots and the obstacle and a trigger anti-collision mechanism between the inspection robots as constraint conditions, and defining rewarding values corresponding to various states; according to the cooperative state and action of the multi-inspection robot and the rewarding values corresponding to various states, the DQN model calculates the expectations of the rewarding values obtained after the set action is executed in the set state;
and the collaborative planning module is used for optimizing the expectations of the return values through the parameter training of the deep neural network by the DQN model, and forming a multi-inspection robot collaborative path according to the states and actions corresponding to the maximum expectations.
A terminal comprising a processor and a storage medium; the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method.
A computer readable storage medium having stored thereon a computer program which when executed by a processor realizes the steps of the method.
Compared with the prior art, the method has the beneficial effects that the method is oriented to a scene of cooperation of a plurality of inspection robots, so that the path planning of the plurality of robots oriented to a plurality of inspection targets is realized, and the efficiency of cooperation inspection of the plurality of inspection robots is improved.
The application not only considers the constraint that the inspection robot can avoid the obstacle area on the path, but also considers the additional constraint condition that the anti-collision mechanism is not triggered among the multiple inspection robots on the basis of the constraint, thereby obviously reducing the energy consumption of the robots and being more beneficial to the cooperative inspection of the multiple inspection robots.
The application adopts the DQN (Deep Q-Network) model to realize the planning of the cooperative path of the multi-inspection robot under a large-scale and complex map.
Drawings
FIG. 1 is a flow chart of a multi-inspection robot collaborative path planning method based on a DQN model;
FIG. 2 is a plan view of a 500kV substation device in an embodiment of the application;
fig. 3 is a schematic diagram of substation equipment area division and task point distribution in an embodiment of the present application;
FIG. 4 is a rasterized map in an embodiment of the application;
FIG. 5 is a graph of model prize value variation in an embodiment of the application;
fig. 6 is a schematic diagram of a cooperative path of a multi-inspection robot in an embodiment of the application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. The described embodiments of the application are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art without inventive faculty, are within the scope of the application, based on the spirit of the application.
The application provides a multi-patrol robot collaborative path planning method based on a DQN model, which is based on the fact that the multi-patrol robot has defined corresponding task points and traversing sequences, and adopts a Deep Q-learning Network (DQN) model to plan the multi-patrol robot collaborative path, as shown in figure 1, and comprises the following steps:
step 1, position coordinates of all the inspection robots and arrival states of all the task points are obtained, and a state space of cooperation of multiple inspection robots is constructed.
Specifically, a collaborative state space is defined, representing the currentPosition coordinates of individual inspection robots +.>And the state that each task point is reached by the corresponding inspection robot. Status space->The following are provided:
(1)
in the method, in the process of the application,
characterization of->The arrival status of the individual task points, when->Time indicates +.>The task points have not been reached by the corresponding inspection robot, < >>Time then representsFirst->The task points are reached by the corresponding inspection robots;
characterization of->The position coordinates of the inspection robots are as follows:
(2)
in the method, in the process of the application,
and->Respectively represent +.>The abscissa and the ordinate of each inspection robot in the grid-shaped map;
and->Representing the total length and the total width of the grid-like map generated based on the planar arrangement of the electrical devices within the substation, respectively.
,/>The total number of the inspection robots; />,/>Is the total number of task points.
And 2, acquiring the moving directions selected by all the inspection robots, and constructing a coordinated action space of the multiple inspection robots.
Specifically, defining a cooperative action space, representing a moving direction selected by the inspection robot, and the action spaceThe method comprises the following steps:
(3)
in the method, in the process of the application,
indicate->The moving direction selected by the inspection robots or staying at the original position, wherein the moving direction comprises the following steps: north N, northeast NE, eastern E, north-south SE, south S, southwest SW, west W, and northwest NW, each inspection robot moves one unit in a selected direction of movement.
And 3, classifying the states by taking a trigger anti-collision mechanism between the inspection robots and the obstacle and a trigger anti-collision mechanism between the inspection robots as constraint conditions, and defining rewarding values corresponding to various states.
Specifically, based on actual conditions, the states include a free state, a semi-successful state, a failed state, and a successful state; the method comprises the following steps:
1) Free State (FS): all the inspection robots do not trigger the respective anti-collision mechanism, and all the inspection robots do not reach the first task point which is needed to be reached respectively;
2) Semi-Successful State (SS): part of task points are reached by the corresponding inspection robots, but the task points which are not reached by the corresponding inspection robots still exist in the environment or part of the inspection robots still exist and are not returned to the charging room;
3) Failure State (DS): triggering an anti-collision mechanism between the inspection robot and the obstacle or triggering an anti-collision mechanism between the inspection robots;
4) Success State (CS): all task points have been reached by their corresponding inspection robots and all inspection robots have been returned to the charging room.
Prize values corresponding in four classes of state spaceConstructing a reward and punishment function as follows:
(4)
in the method, in the process of the application,is->The number of task points that the individual inspection robot has arrived.
The application improves the state space in the definition stage of the state space, defines the trigger of the anti-collision mechanism among different inspection robots (namely, the coordinates of a plurality of inspection robots on a grid-shaped map at the same time) and the collision of the inspection robots and the static obstacle as a failure state, and deducts punishment of the punishment value from the failure state when a punishment function is defined. As the subsequent DQN model continuously seeks the increment of the rewarding value in the training process, the DQN model can avoid entering a failure state through continuous trial and error, and therefore, the application considers the constraint that the inspection robot can avoid an obstacle area on a path, also considers the additional constraint condition that an anti-collision mechanism is not triggered among multiple inspection robots on the basis of the constraint, obviously reduces the energy consumption of the robots, and is more beneficial to the collaborative inspection of the multiple inspection robots.
And 4, calculating the expectation of the return value obtained after the setting action is executed in the setting state by the DQN model according to the state space and the action space of the cooperation of the multiple inspection robots and the reward value corresponding to various state spaces.
Specifically, the expectation of the return valueThe following are provided:
(5)
in the method, in the process of the application,
indicating time->The lower return function is provided with a return function,
indicating time->The state space in the lower part of the system,
indicating time->The lower part of the action space is provided with a plurality of grooves,
representing the desired function.
Due to the patrolThe different states of the inspection robot are time dependent and thus the time of day can be usedTo characterize the different states.
From equation (5), the return value generated by performing the setting operation in the setting state is calculated using a punishment function, and the return value is used to evaluate the state-operation pair.
The reward function is that after the setting action is executed in the setting state, the sum of reward values in the subsequent state satisfies the following relation:
(6)
in the method, in the process of the application,
indicating time->The lower prize value reflects the prize value obtained by entering the subsequent state after the setting action is executed in the setting state,
indicating time->Lower->The prize value corresponding to each task point reflects the prize value obtained by entering the subsequent state after executing the setting action in the setting state,
indicate->Corresponding to each task pointDiscount factor (s)/(s)>
The derivation of formula (6) is as follows:
from the above, the time of day can be seenLower reward function->Is an iterative function of the prize value, moment +.>Every time 1 is added, the corresponding reward value needs to be multiplied by a discount factor, and the larger the interval between the subsequent state and the current state is, the smaller the influence of the reward value corresponding to the subsequent state on the current state return function is.
Since the expected return value obtained after the setting operation is performed in the set state is not unique when the multi-inspection robot cooperates, the configuration is thatA value table.
And 5, optimizing the expectations of the return values by the DQN model through parameter training of the deep neural network, and taking the state and action corresponding to the maximum expectations as a multi-inspection robot cooperative path.
Specifically, the DQN model builds a structurally consistent estimated network and target network and generates corresponding network parametersAndand (5) performing parameter optimization.
Estimating that the inputs of the network and the target network are both state-action pairsWherein the output of the estimation network isFor estimating the state-action pair at the present moment>The corresponding Q value; the target network output isThe method comprises the steps of storing an optimal Q value in a parameter training process, wherein the optimal Q value is the maximum Q value in a target network; the estimation network takes the optimal Q value stored in the target network as a learning target pair parameter>Update and use the updated parameters +.>And participating in calculation of an estimated network loss function, wherein the estimated network loss function is as follows:
(7)
in the method, in the process of the application,
to estimate the network loss function, it can be seen from equation (7) that the estimated network loss function is an estimated network parameter +.>Is a continuous function of (a) and (b),
for optimization purposes, the calculation formula is as follows:
(8)
in the method, in the process of the application,for the prize value obtained by entering the subsequent state after executing the setting action in the setting state,/for the prize value obtained by entering the subsequent state after executing the setting action in the setting state>For discounts factor->,/>Is in a subsequent state.
And taking the Q value corresponding to the maximum value of the optimization target as an optimal Q value.
The application adopts the DQN (Deep Q-Network) model to realize the planning of the cooperative path of the multi-inspection robot under a large-scale and complex map.
Further, the DQN model introduces an experience storage mechanism, and when the multi-inspection robot interacts with the environment, transfer samples of the post-interaction characterization state, action and rewarding value are obtainedAnd (3) storing the data into an experience pool, randomly extracting a part of samples from the experience pool when the experience pool is full for calculating a loss function, and updating network parameters by adopting a random gradient descent algorithm so as to break the relevance between the data.
The application is further described below based on the actual situation of a certain 500kV substation.
The equipment plane layout of the transformer substation is shown in fig. 2; firstly, dividing the inspection area by equipment areas with different voltage levels, and calibrating inspection task points as shown by black dots in fig. 3; then, the grid region is divided by taking 10 meters as a unit, task points are numbered, 56 task points are counted, the geographic coordinates of the 56 task points can be represented by grid point serial numbers of the grid map in the horizontal direction and the vertical direction, as shown in fig. 4, pentagons represent charging chambers, black squares represent task points, each task point is numbered, white squares represent movable regions, and gray rectangles represent immovable regions.
The task point division given for 3 inspection robots and the inspection sequence for each inspection robot are shown in table 1.
TABLE 1 cooperative inspection scheme for multiple inspection robots
Constructing an DQN model for the cooperative obstacle avoidance path planning of the multi-inspection robot and performing model training, wherein the related parameter settings are shown in table 2:
TABLE 2 DQN model and training parameter settings
In the training process, the change process of the reward value is shown in fig. 5, and as can be seen from fig. 5, the experience pool is not recorded when the model starts training, so that the model is continuously explored in the starting stage, and the reward value is lower and is in an oscillation state; after 200 times of training are finished, network parameter optimization is started by randomly taking out records in the experience pool, and then the reward value slowly rises; model once falls into local optimum at about 500 th to 700 th iterations; when the iteration number is close to 1700, the model converges, and the rewarding value tends to be stable, which means that useless actions consumed by the multi-patrol robot for completing path planning are gradually reduced.
The resulting collaborative inspection path is shown in fig. 6. As can be seen from fig. 6, the three inspection robots cooperate with each other, each independent inspection path traverses the assigned inspection task point in sequence, and no anti-collision mechanism is triggered between the inspection robot and the obstacle or between the inspection robots. The inspection routes of the three inspection robots are shown in table 1.
The application also provides a multi-inspection robot collaborative path planning system, wherein each inspection robot has defined the corresponding task point and the traversing sequence, and the system comprises:
the system comprises a state module, an action module, a state action pair evaluation module and a collaborative planning module;
the state module is used for acquiring the position coordinates of all the inspection robots and the arrival state of each task point and constructing a coordinated state space of the plurality of inspection robots;
the action module is used for acquiring the moving directions selected by all the inspection robots and constructing a coordinated action space of the plurality of inspection robots;
the state action pair evaluation module is used for classifying the states by taking a trigger anti-collision mechanism between the inspection robots and the obstacle and a trigger anti-collision mechanism between the inspection robots as constraint conditions, and defining rewarding values corresponding to various states; according to the cooperative state and action of the multi-inspection robot and the rewarding values corresponding to various states, the DQN model calculates the expectations of the rewarding values obtained after the set action is executed in the set state;
and the collaborative planning module is used for optimizing the expectations of the return values through the parameter training of the deep neural network by the DQN model, and forming a multi-inspection robot collaborative path according to the states and actions corresponding to the maximum expectations.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the specific embodiments of the application without departing from the spirit and scope of the application, which is intended to be covered by the claims.

Claims (14)

1. A multi-inspection robot collaborative path planning method based on an DQN model, wherein each inspection robot has defined respective corresponding task points and traversal sequences, is characterized by comprising the following steps:
step 1, acquiring position coordinates of all inspection robots and arrival states of all task points, and constructing a coordinated state space of multiple inspection robots;
step 2, obtaining the moving directions selected by all the inspection robots, and constructing a coordinated action space of a plurality of inspection robots;
step 3, classifying states by taking a trigger anti-collision mechanism between the inspection robots and the obstacle and a trigger anti-collision mechanism between the inspection robots as constraint conditions, and defining rewarding values corresponding to various states;
step 4, calculating expectations of a return value obtained after the set action is executed in the set state by the DQN model according to the cooperative state and action of the multi-inspection robot and the reward value corresponding to each state;
and 5, optimizing the expectations of the return values by the DQN model through parameter training of the deep neural network, and forming a multi-inspection robot cooperative path according to the state and the action corresponding to the maximum expectations.
2. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 1, wherein,
state spaceThe following are provided:
(1)
in the method, in the process of the application,
characterization of->The arrival status of the individual task points, when->Time indicates +.>The task points have not been reached by the corresponding inspection robot, < >>Time then indicates->The task points are reached by the corresponding inspection robots;
characterization of->Position coordinates of the individual inspection robots;
,/>the total number of the inspection robots;
,/>is the total number of task points.
3. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 2, wherein,
the position coordinates of the inspection robot are as follows:
(2)
in the method, in the process of the application,
and->Respectively represent +.>The abscissa and the ordinate of each inspection robot in the grid-shaped map;
and->Respectively representing grids generated based on planar arrangement of power equipment in transformer substationThe total length and the total width of the grid map.
4. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 1, wherein,
action spaceThe method comprises the following steps:
(3)
in the method, in the process of the application,
indicate->The moving direction selected by the inspection robots or staying at the original position, wherein the moving direction comprises the following steps: north N, northeast NE, eastern E, north-south SE, south S, southwest SW, west W, and northwest NW, each inspection robot moves one unit in a selected direction of movement.
5. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 1, wherein,
the states include a free state, a semi-successful state, a failed state, and a successful state; the method comprises the following steps:
1) Free state: all the inspection robots do not trigger the respective anti-collision mechanism, and all the inspection robots do not reach the first task point which is needed to be reached respectively;
2) Semi-successful state: part of task points are reached by the corresponding inspection robots, but the task points which are not reached by the corresponding inspection robots still exist in the environment or part of the inspection robots still exist and are not returned to the charging room;
3) Failure state: triggering an anti-collision mechanism between the inspection robot and the obstacle or triggering an anti-collision mechanism between the inspection robots;
4) Successful status: all task points have been reached by their corresponding inspection robots and all inspection robots have been returned to the charging room.
6. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 5, wherein,
prize values corresponding in four classes of statesConstructing a reward and punishment function as follows:
(4)
in the method, in the process of the application,is->The number of task points reached by the individual inspection robots, < >>For the state, FS represents the free state, SS represents the semi-successful state, DS represents the failed state, and CS represents the successful state.
7. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 1, wherein,
expectations of return valuesThe following are provided:
(5)
in the method, in the process of the application,
indicating time->The lower return function is provided with a return function,
indicating time->The state space in the lower part of the system,
indicating time->The lower part of the action space is provided with a plurality of grooves,
representing the desired function.
8. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 7,
the reward function is the sum of reward values of subsequent states after the setting action is executed in the setting state, and the following relational expression is satisfied:
(6)
in the method, in the process of the application,
indicating time->The lower return function is provided with a return function,
indicating time->The lower prize value reflects the prize value obtained by entering the subsequent state after the setting action is executed in the setting state,
indicating time->Lower->The prize value corresponding to each task point reflects the prize value obtained by entering the subsequent state after executing the setting action in the setting state,
indicate->Discount factors corresponding to each task point, +.>
For discounts factor->
9. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 1, wherein,
the DQN model builds an estimation network and a target network with consistent structures, and corresponding network parametersAnd->And (5) performing parameter optimization.
10. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 9,
estimating that the inputs of the network and the target network are both state-action pairsWherein the output of the estimation network isFor estimating the state-action pair at the present moment>The corresponding Q value; the target network output isThe method is used for storing the optimal Q value in the parameter training process, wherein the optimal Q value is the maximum Q value in the target network.
11. The method for collaborative path planning for a multi-inspection robot based on a DQN model as claimed in claim 10,
the estimation network takes the optimal Q value stored by the target network as a studyLearning object pair parametersUpdate and use the updated parameters +.>And participating in calculation of an estimated network loss function, wherein the estimated network loss function is as follows:
(7)
in the method, in the process of the application,
in order to estimate the network loss function,
in order to optimize the goal of the present application,
the desired function is represented by a function of the desired function,
and taking the Q value corresponding to the maximum value of the optimization target as an optimal Q value.
12. A multi-inspection robot collaborative path planning system for implementing the method of any of claims 1-11, wherein each inspection robot has defined a respective task point and traversal order, comprising:
the system comprises a state module, an action module, a state action pair evaluation module and a collaborative planning module;
the state module is used for acquiring the position coordinates of all the inspection robots and the arrival state of each task point and constructing a coordinated state space of the plurality of inspection robots;
the action module is used for acquiring the moving directions selected by all the inspection robots and constructing a coordinated action space of the plurality of inspection robots;
the state action pair evaluation module is used for classifying the states by taking a trigger anti-collision mechanism between the inspection robots and the obstacle and a trigger anti-collision mechanism between the inspection robots as constraint conditions, and defining rewarding values corresponding to various states; according to the cooperative state and action of the multi-inspection robot and the rewarding values corresponding to various states, the DQN model calculates the expectations of the rewarding values obtained after the set action is executed in the set state;
and the collaborative planning module is used for optimizing the expectations of the return values through the parameter training of the deep neural network by the DQN model, and forming a multi-inspection robot collaborative path according to the states and actions corresponding to the maximum expectations.
13. A terminal comprising a processor and a storage medium; the method is characterized in that:
the storage medium is used for storing instructions;
the processor being operative according to the instructions to perform the steps of the method according to any one of claims 1-11.
14. Computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-11.
CN202310604238.4A 2023-05-26 2023-05-26 DQN model-based multi-inspection robot collaborative path planning method and system Active CN116382304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310604238.4A CN116382304B (en) 2023-05-26 2023-05-26 DQN model-based multi-inspection robot collaborative path planning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310604238.4A CN116382304B (en) 2023-05-26 2023-05-26 DQN model-based multi-inspection robot collaborative path planning method and system

Publications (2)

Publication Number Publication Date
CN116382304A CN116382304A (en) 2023-07-04
CN116382304B true CN116382304B (en) 2023-09-15

Family

ID=86969689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310604238.4A Active CN116382304B (en) 2023-05-26 2023-05-26 DQN model-based multi-inspection robot collaborative path planning method and system

Country Status (1)

Country Link
CN (1) CN116382304B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117270393B (en) * 2023-10-07 2024-05-17 重庆大学 Intelligent robot cluster cooperative control system
CN117933673B (en) * 2024-03-22 2024-06-21 广东电网有限责任公司湛江供电局 Line patrol planning method and device and line patrol planning system
CN117970932B (en) * 2024-04-01 2024-06-07 中数智科(杭州)科技有限公司 Task allocation method for collaborative inspection of multiple robots of rail train

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019068236A1 (en) * 2017-10-04 2019-04-11 Huawei Technologies Co., Ltd. Method of selection of an action for an object using a neural network
CN110321666A (en) * 2019-08-09 2019-10-11 重庆理工大学 Multi-robots Path Planning Method based on priori knowledge Yu DQN algorithm
CN112214791A (en) * 2020-09-24 2021-01-12 广州大学 Privacy policy optimization method and system based on reinforcement learning and readable storage medium
CN112286203A (en) * 2020-11-11 2021-01-29 大连理工大学 Multi-agent reinforcement learning path planning method based on ant colony algorithm
CN112362066A (en) * 2020-11-20 2021-02-12 西北工业大学 Path planning method based on improved deep reinforcement learning
CN113110509A (en) * 2021-05-17 2021-07-13 哈尔滨工业大学(深圳) Warehousing system multi-robot path planning method based on deep reinforcement learning
CN113326872A (en) * 2021-05-19 2021-08-31 广州中国科学院先进技术研究所 Multi-robot trajectory planning method
CN114069838A (en) * 2021-10-05 2022-02-18 国网辽宁省电力有限公司电力科学研究院 Transformer substation robot intelligent inspection system and method with intelligent sensor actively cooperated
CN114355973A (en) * 2021-12-28 2022-04-15 哈尔滨工程大学 Multi-agent hierarchical reinforcement learning-based unmanned cluster cooperation method under weak observation condition
GB202209423D0 (en) * 2021-08-12 2022-08-10 Univ Xidian Method for multi-agent dynamic path planning
CN114895673A (en) * 2022-04-26 2022-08-12 武汉理工大学 Ship collision avoidance decision method based on deep reinforcement learning under rule constraint
CN115047878A (en) * 2022-06-13 2022-09-13 常州大学 DM-DQN-based mobile robot path planning method
CN115563527A (en) * 2022-09-27 2023-01-03 西南交通大学 Multi-Agent deep reinforcement learning framework and method based on state classification and assignment
CN115826581A (en) * 2022-12-28 2023-03-21 大连大学 Mobile robot path planning algorithm combining fuzzy control and reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11934191B2 (en) * 2019-07-05 2024-03-19 Huawei Technologies Co., Ltd. Method and system for predictive control of vehicle using digital images

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019068236A1 (en) * 2017-10-04 2019-04-11 Huawei Technologies Co., Ltd. Method of selection of an action for an object using a neural network
CN110321666A (en) * 2019-08-09 2019-10-11 重庆理工大学 Multi-robots Path Planning Method based on priori knowledge Yu DQN algorithm
CN112214791A (en) * 2020-09-24 2021-01-12 广州大学 Privacy policy optimization method and system based on reinforcement learning and readable storage medium
CN112286203A (en) * 2020-11-11 2021-01-29 大连理工大学 Multi-agent reinforcement learning path planning method based on ant colony algorithm
CN112362066A (en) * 2020-11-20 2021-02-12 西北工业大学 Path planning method based on improved deep reinforcement learning
CN113110509A (en) * 2021-05-17 2021-07-13 哈尔滨工业大学(深圳) Warehousing system multi-robot path planning method based on deep reinforcement learning
CN113326872A (en) * 2021-05-19 2021-08-31 广州中国科学院先进技术研究所 Multi-robot trajectory planning method
GB202209423D0 (en) * 2021-08-12 2022-08-10 Univ Xidian Method for multi-agent dynamic path planning
CN114069838A (en) * 2021-10-05 2022-02-18 国网辽宁省电力有限公司电力科学研究院 Transformer substation robot intelligent inspection system and method with intelligent sensor actively cooperated
CN114355973A (en) * 2021-12-28 2022-04-15 哈尔滨工程大学 Multi-agent hierarchical reinforcement learning-based unmanned cluster cooperation method under weak observation condition
CN114895673A (en) * 2022-04-26 2022-08-12 武汉理工大学 Ship collision avoidance decision method based on deep reinforcement learning under rule constraint
CN115047878A (en) * 2022-06-13 2022-09-13 常州大学 DM-DQN-based mobile robot path planning method
CN115563527A (en) * 2022-09-27 2023-01-03 西南交通大学 Multi-Agent deep reinforcement learning framework and method based on state classification and assignment
CN115826581A (en) * 2022-12-28 2023-03-21 大连大学 Mobile robot path planning algorithm combining fuzzy control and reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Ship Collision Avoidance Using Constrained Deep Reinforcement Learning;Zhang, R等;《5th International Conference on Behavioral, Economic, and Socio-Cultural Computing》;全文 *
基于强化学习的多机器人任务分配方法研究;陈云飞;《中国优秀硕士学位论文全文数据库 信息科技辑》(第03期);全文 *
特定路网环境下基于强化学习的运动协调算法改进研究;郝秀召;《中国优秀硕士学位论文全文数据库 信息科技辑》(第03期);全文 *

Also Published As

Publication number Publication date
CN116382304A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN116382304B (en) DQN model-based multi-inspection robot collaborative path planning method and system
US20230037632A1 (en) Reinforcement learning method and apparatus
Yue et al. Review and empirical analysis of sparrow search algorithm
US20220092418A1 (en) Training method for air quality prediction model, prediction method and apparatus, device, program, and medium
CN112947591A (en) Path planning method, device, medium and unmanned aerial vehicle based on improved ant colony algorithm
Liu et al. Robot search path planning method based on prioritized deep reinforcement learning
CN117039894B (en) Photovoltaic power short-term prediction method and system based on improved dung beetle optimization algorithm
Su et al. Robot path planning based on random coding particle swarm optimization
CN117213497A (en) AGV global path planning method based on deep reinforcement learning
CN115293623A (en) Training method and device for production scheduling model, electronic equipment and medium
CN115145311A (en) Routing inspection path planning method, device, equipment and storage medium
CN112613608A (en) Reinforced learning method and related device
Tusi et al. Using ABC and RRT algorithms to improve mobile robot path planning with danger degree
CN117806340B (en) Airspace training flight path automatic planning method and device based on reinforcement learning
CN114815801A (en) Adaptive environment path planning method based on strategy-value network and MCTS
CN117032247B (en) Marine rescue search path planning method, device and equipment
CN113595798A (en) Network flow prediction method and system for improving lightning connection process optimization algorithm
Petrovic et al. Adopting linear optimization to support autonomous vehicles in smart city
CN117132069A (en) Unmanned aerial vehicle cluster material delivery task distribution method and system
Li et al. Research on path planning of cloud robot in dynamic environment based on improved ddpg algorithm
CN114971024A (en) Fan state prediction method and device
CN107196328A (en) A kind of power distribution network light stores up association system access point vulnerability index forecasting method
Wang et al. SRM: An efficient framework for autonomous robotic exploration in indoor environments
Fengchun et al. Research on power grid inspection path based on edge computing
CN113910221A (en) Mechanical arm autonomous motion planning method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant