CN116643587A - Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic - Google Patents

Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic Download PDF

Info

Publication number
CN116643587A
CN116643587A CN202310623947.7A CN202310623947A CN116643587A CN 116643587 A CN116643587 A CN 116643587A CN 202310623947 A CN202310623947 A CN 202310623947A CN 116643587 A CN116643587 A CN 116643587A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
pheromone
function
coverage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310623947.7A
Other languages
Chinese (zh)
Inventor
何斌
程徐
蒋荣
李刚
程斌
陆萍
张朋朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202310623947.7A priority Critical patent/CN116643587A/en
Publication of CN116643587A publication Critical patent/CN116643587A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to a multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic, which comprises the following steps: step S1, constructing a coverage scene graph for autonomously collecting natural disaster area data by a plurality of unmanned aerial vehicles; s2, constructing a multi-unmanned aerial vehicle path pheromone map based on the pheromone aggregation model, and marking the multi-unmanned aerial vehicle path; step S3, defining a state function, an action function and a reward function according to the multi-unmanned aerial vehicle autonomous coverage task, establishing a multi-unmanned aerial vehicle autonomous coverage model based on a neural network and training; and S4, performing multi-unmanned aerial vehicle autonomous coverage by adopting a trained multi-unmanned aerial vehicle autonomous coverage model. Compared with the prior art, the invention has the advantages of high efficiency, energy conservation, high coverage rate and wide applicability.

Description

Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic
Technical Field
The invention relates to the technical field of unmanned aerial vehicle autonomous control, in particular to a multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic.
Background
Once the static sensor nodes are deployed in the traditional static area coverage method, if coverage holes or faults occur in the sensor nodes, the sensor nodes are difficult to redeploy. In contrast, dynamic area coverage is more flexible and therefore is widely used in the fields of target detection, structural health monitoring, and disaster relief.
In the dynamic region coverage process, due to operability, flexibility and robustness of the unmanned aerial vehicle, the performance of the mobile sensor network can be greatly improved by dynamically adjusting the position according to environmental changes, the unmanned aerial vehicle is controlled by a dynamic coverage algorithm to complete the collection work of the Internet of things equipment or the wireless sensor data, and the unmanned aerial vehicle can accurately cover the target region. On the other hand, due to the cost advantage and the rapid deployment capability of the unmanned aerial vehicle wireless communication system, the unmanned aerial vehicle wireless communication system is more suitable for emergency communication recovery when communication infrastructure is damaged due to emergencies or duration is limited, such as natural disasters, so as to enhance the coverage range and performance of the communication network in emergency communication.
Control algorithms for unmanned aerial vehicle dynamic area coverage are generally divided into two categories, non-self-organizing control algorithms and self-organizing control algorithms. The non-self-organizing dynamic coverage algorithm reduces repeated coverage of the target area through bit-type space modeling optimization so as to achieve efficient coverage. However, this non-self-organizing control method is less robust and in emergency situations, unmanned aerial vehicles malfunction or are too far apart, resulting in coverage holes. The self-organizing control algorithm enables the unmanned aerial vehicle to have stronger robustness and good autonomy, and the problem of uneven coverage of a target area under emergency can be solved through local information interaction. Intelligent deep reinforcement learning is considered as a key technology for unmanned aerial vehicle autonomous control, as they can handle complex state spaces and environments well. However, conventional RL methods, such as Q-Learning or policy gradient methods, are not suitable for use in a dynamic overlay environment for multiple drones. Because the strategy of each unmanned aerial vehicle is continuously changed in the training process, the unmanned aerial vehicle is always in a dynamic environment, and the strategy of the unmanned aerial vehicle is difficult to converge.
This task is quite challenging in order to provide long-term effective dynamic area coverage. Because UAVs are typically small to medium battery-operated devices and have limited communication range, limited time of flight and payload capacity, the dynamic coverage process of the drone needs to operate in an energy-efficient manner for purposes of extending the network lifecycle. In addition, unmanned aerial vehicle works in dynamic, open and unpredictable environment, realizes resource sharing and can effectively reduce unmanned aerial vehicle's repeated cover and reduce unmanned aerial vehicle's unnecessary energy consumption.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a multi-unmanned-plane autonomous coverage method based on pheromone inspiring, which has the advantages of high efficiency, energy conservation, high coverage rate, wide applicability and the like.
The aim of the invention can be achieved by the following technical scheme:
according to a first aspect of the present invention, there is provided a multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics, the method comprising the steps of:
step S1, constructing a coverage scene graph for autonomously collecting natural disaster area data by a plurality of unmanned aerial vehicles;
s2, constructing a multi-unmanned aerial vehicle path pheromone map based on the pheromone aggregation model, and marking the multi-unmanned aerial vehicle path;
step S3, defining a state function, an action function and a reward function according to the multi-unmanned aerial vehicle autonomous coverage task, establishing a multi-unmanned aerial vehicle autonomous coverage model based on a neural network and training;
and S4, performing multi-unmanned aerial vehicle autonomous coverage by adopting a trained multi-unmanned aerial vehicle autonomous coverage model.
Preferably, the coverage scene graph in step S1 in which the multiple unmanned aerial vehicles autonomously collect natural disaster area data includes unmanned aerial vehicles, coverage targets and obstacle positions, specifically:
defining n=1, 2, …, N denotes the number of each unmanned aerial vehicle, N is the total number of unmanned aerial vehicles, and the set of all unmanned aerial vehicles is expressed as:
defining k=1, 2, …, K represents the number of each disaster source in the task area, K is the total number of disaster sources, and the set of all disaster sources in the natural disaster area is represented as:
the coverage area is defined as a two-dimensional bounded area of length L and width W, the two-dimensional area being divided into cells, and the area containing randomly generated obstacles.
Preferably, said step S2 comprises the following sub-steps:
step S21, initializing a pheromone map;
step S22, an aggregation process of the pheromone map.
Preferably, the pheromone initialization function of the starting position of the unmanned aerial vehicle is:
P e,0 =P ((x,y), (1)
wherein (x, y) represents the starting position of the unmanned aerial vehicle, P e,0 And the pheromone concentration of the unmanned aerial vehicle initialization in the task area is represented.
Preferably, the aggregation process of the pheromone map specifically comprises the following steps:
when the drone moves to cell i, the total amount of pheromones at this point is expressed as:
wherein ,the amount of pheromone released at the cell i for the kth unmanned aerial vehicle; n representsAt the time (t-1, t), at the cell i, the radius is the total number of unmanned aerial vehicles within the unmanned aerial vehicle perception range r;
the trace pheromone of the ith cell at time (t, t+1) is expressed as:
wherein, eta is the pheromone evaporation coefficient, eta is E (0, 1);
the total amount of pheromones of the ith cell at time t+1 is expressed as:
wherein eta is the pheromone evaporation coefficient, eta epsilon (0, 1),the value of the pheromone of the cell i at the time t+1 is increased;
the value of trace pheromone is limited to [ P ] min ,P max ]In this case, the local pheromone update rule and the global pheromone update rule are integrated as follows:
preferably, in the step S3, the status function, the action function, and the reward function are respectively:
status function: trace pheromone concentration containing task area cellsN th UAV remaining energy consumption +.>Cover status of disaster source +.>At time t, the state function is expressed as:
action function: including direction of flight of unmanned aerial vehicleAnd flight distance of unmanned aerial vehicle->At time t, the action function is expressed as:
bonus function: rewards comprising unmanned aerial vehicle coverage disaster source acquisitionRewarding of impact obstacle>Rewards overlapping a certain area ++>At time t, the bonus function is expressed as:
preferably, in the step S3, the autonomous coverage model of multiple unmanned aerial vehicles based on the neural network is specifically: the neural network comprises an action network and an evaluation network, and specifically comprises the following steps: the input of the action network is the state function of each unmanned aerial vehicleOutput is perAction function of unmanned aerial vehicle>The inputs of the evaluation network are the joint state s (t) and the joint action a (t), the output is the Q value, and the Q value is recorded as Q (s, a).
Preferably, the training process of the multi-unmanned aerial vehicle autonomous coverage model based on the neural network specifically comprises the following steps:
1) Random initialization evaluation network Q (s, a|θ Q ) And action network pi (s|theta π ) And corresponding weight parameter θ Q and θπ
2) Playback of buffers from experienceA batch of samples are randomly extracted, and the number of samples is +.>One of them is(s) j ,a j ,r j ,s ′j ) Where j is the sample index;
3) Using minimization of loss functionsUpdating the unmanned aerial vehicle evaluation network:
wherein ,yj Is calculated by the Target Q network of the j-th unmanned aerial vehicle,representing action a taken using policy pi under the joint state function s 1 ,…,a N ) Is a desired discount prize for (1);
4) Updating target network parameters corresponding to the action network and the evaluation network:
wherein the parameter tau represents the learning rate,is a target network parameter.
According to a second aspect of the present invention there is provided an electronic device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method of any one of the above when executing the program.
According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any one of the above.
Compared with the prior art, the invention has the following advantages:
1) According to the invention, the multi-unmanned aerial vehicle path pheromone map is utilized, so that overlapping coverage of a task area in a short time is effectively avoided, and energy consumption of unmanned aerial vehicles is minimized;
2) The method is based on the rewarding function designed by the autonomous coverage task of the multiple unmanned aerial vehicles, effectively guides the multiple unmanned aerial vehicles to cooperatively complete the data collection problem in the natural disaster area, and solves the problem of dynamic instability in the reinforcement learning of the multiple unmanned aerial vehicles;
3) The multi-unmanned aerial vehicle autonomous coverage method provided by the invention has the advantages of high efficiency, energy conservation, high coverage rate, wide applicability and the like, and further improves the disaster rescue efficiency of the multi-unmanned aerial vehicle.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a scene graph of the invention for autonomous collection of data by multiple unmanned aerial vehicles in a natural disaster area;
FIG. 3 is a schematic diagram of a pheromone map of the present invention;
FIG. 4 is a schematic diagram of a network architecture according to the present invention;
FIG. 5 is a schematic diagram of an Actor network in a natural disaster application scenario according to the present invention;
fig. 6 is a schematic diagram of Critic network in a natural disaster application scenario according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Examples
As shown in fig. 1, in this embodiment, a multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic by taking natural disaster area data collected by a multi-unmanned aerial vehicle as a research background, the method includes the following steps:
step S1, establishing an autonomous coverage schematic diagram of a plurality of unmanned aerial vehicles, wherein the autonomous coverage schematic diagram comprises coverage targets and barrier positions, and further constructing a coverage scene diagram for autonomously collecting natural disaster area data of the plurality of unmanned aerial vehicles, as shown in FIG. 2, specifically comprising:
1) Defining n=1, 2, …, N denotes the number of each unmanned aerial vehicle, N is the total number of unmanned aerial vehicles, the set of all unmanned aerial vehicles can be expressed as:
2) Defining k=1, 2, …, K represents the number of each disaster source in the task area, K is the total number of disaster sources, and then the set of all disaster sources in the natural disaster area can be expressed as:
3) The coverage area is defined as a two-dimensional bounded area of length L and width W, the two-dimensional area being divided into smaller cells and the area containing a number of obstacles that are randomly generated.
Step S2, designing an aggregation model of pheromones based on a pheromone mechanism, constructing a multi-unmanned aerial vehicle path pheromone map, and marking a multi-unmanned aerial vehicle path, and specifically comprises the following sub-steps:
1) Initialization process of pheromone map:
ants use pheromones to guide foraging routes of other ants, trace pheromones are defined to record the driving path of the unmanned aerial vehicle, the mechanism can effectively avoid repeated access of the unmanned aerial vehicle in a short time, and compared with an exhaustive search algorithm, the mechanism can reduce the search difficulty and improve the global coverage rate.
The pheromone initialization function of the starting position of the unmanned aerial vehicle is as follows:
P e,0 =P ((x,y),0) (1)
wherein (x, y) represents the starting position of the unmanned aerial vehicle, P e,0 And the pheromone concentration of the unmanned aerial vehicle initialization in the task area is represented.
2) Aggregation process of pheromone map:
first, each unmanned aerial vehicle can sense trace pheromone concentration within a certain range.
Any cell filled with trace pheromones is considered as a repulsive force of the local environment, which has a repulsive effect on nearby unmanned aerial vehicles, thereby effectively observing the local state space. At this time, the drone needs to select the area with the lowest trace pheromone concentration to perform an action, which represents that the area is not accessed or is rarely accessed. When the drone moves to cell i, the total amount of pheromones at that location can be expressed as:
wherein ,the amount of pheromone released at the cell i for the kth unmanned aerial vehicle; n represents the time (t-1, t), and at cell i, the radius is the total number of unmanned aerial vehicles within the unmanned aerial vehicle perception range r.
The natural process of aggregation of trace pheromones then involves evaporation, which means that the precipitated pheromones will disappear over time, which also satisfies that the unmanned aerial vehicle will collect disaster source data again after a period of time. The amount of evaporation of the trace pheromone of the ith cell at time (t, t+1) can be expressed as:
wherein, eta is the pheromone evaporation coefficient and eta is (0, 1).
Finally, the total amount of pheromones of the ith cell at time t+1 can be expressed as:
wherein eta is the pheromone evaporation coefficient, eta epsilon (0, 1),the value of the pheromone of the cell i at the time t+1 is increased.
To avoid the algorithm falling into local optimum and to prevent the trace pheromone from being excessively large in negative value and fluctuation, the value of the trace pheromone is limited to [ P ] min ,P max ]Within the inner part. The search space is increased, the rapid convergence speed of the algorithm is ensured, and the local pheromone updating rule and the global pheromone updating rule are integrated as follows:
step S3, defining a state function, an action function and a reward function according to the multi-unmanned aerial vehicle autonomous coverage task, establishing a multi-unmanned aerial vehicle autonomous coverage model based on a neural network and training;
the state function, action function and rewarding function expressions are respectively:
(1) status function: trace pheromone concentration containing task area cellsN th UAV remaining energyConsumption of quantityCover status of disaster source +.>At time t, the state function may be expressed as:
wherein whenWhen, it indicates that disaster source k has been covered; if->Indicating that the disaster source is uncovered.
(2) Action function: including direction of flight of unmanned aerial vehicleAnd flight distance of unmanned aerial vehicle->
the action function expression at the time t is:
wherein ,representing the flight direction of the unmanned plane n; />Representing the flight distance of the unmanned aerial vehicle n. If->Indicating that the unmanned aerial vehicle is in a hovering state; if->Indicating that the drone has flown the maximum distance under the energy allowed.
(3) Bonus function: rewards comprising unmanned aerial vehicle coverage disaster source acquisitionRewarding of impact obstacle>Rewards overlapping a certain area ++>Three parts.
Wherein, when the disaster source is within the coverage range r of the unmanned aerial vehicle, the nth UAV obtains rewardsV k The value of k disaster sources in the coverage range r of the unmanned plane is represented;
when the unmanned aerial vehicle hits an obstacle or boundary, the nth UAV is rewardedR is the total number of times an obstacle or boundary is struck.
When the unmanned aerial vehicle selects the direction, if the area with the highest pheromone is selected, the area is represented to be visited by a plurality of unmanned aerial vehicles, and the nth UAV is rewarded
Thus, at time t, the bonus function can be expressed as:
the method comprises the steps of (1) neural network and updating parameters to obtain a multi-unmanned aerial vehicle autonomous coverage model based on the neural network, wherein the model comprises the following specific steps:
as shown in FIG. 4, the network structure comprises two parts, namely an Actor network and a Critic network. As shown in fig. 5, the input to the Actor network is a status function for each droneThe output is the action function of each unmanned aerial vehicle>As shown in fig. 6, the input of the Critic network is the joint state s (t) and the joint action a (t), and the output is the Q value, denoted as Q (s, a0.
The training process of the neural network specifically comprises the following steps:
1) Random initialization of Critic network Q (s, a|q Q ) And an Actor network pi (s|theta π ) Weight parameter θ Q and θπ Simultaneously initializing an empirical playback buffer
2) Calculating the state s (t), the action a (t) and the reward function r (t) of the N unmanned aerial vehicle at the moment t, and calculating the state s at the next moment And store it in an experience playback buffer
3) Playback of buffers from experienceA batch of samples are randomly extracted, and the number of samples is +.>One of them is(s) j ,a j ,r j ,s ′j ) Where j is the sample index;
4) With minimized lossesLoss functionUpdating the Critic network of the unmanned aerial vehicle:
wherein yj Is calculated by the Target Q network of the j-th unmanned aerial vehicle,representing action a taken using policy pi under the joint state function s 1 ,…,a N ) Is a desired discount prize for (1).
5) Updating target network parameters corresponding to the initiator network and the Critic network:
wherein the parameter tau represents the learning rate,representing Target policy network parameters, +.>Representing Target policy network parameters.
And S4, performing multi-unmanned aerial vehicle autonomous coverage by adopting a trained multi-unmanned aerial vehicle autonomous coverage model.
The electronic device of the present invention includes a Central Processing Unit (CPU) that can perform various appropriate actions and processes according to computer program instructions stored in a Read Only Memory (ROM) or computer program instructions loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device can also be stored. The CPU, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
A plurality of components in a device are connected to an I/O interface, comprising: an input unit such as a keyboard, a mouse, etc.; an output unit such as various types of displays, speakers, and the like; a storage unit such as a magnetic disk, an optical disk, or the like; and communication units such as network cards, modems, wireless communication transceivers, and the like. The communication unit allows the device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processing unit performs the respective methods and processes described above, for example, the methods S1 to S4. For example, in some embodiments, methods S1-S4 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or the communication unit. When the computer program is loaded into RAM and executed by the CPU, one or more steps of the methods S1 to S4 described above may be performed. Alternatively, in other embodiments, the CPU may be configured to perform methods S1-S4 by any other suitable means (e.g., by means of firmware).
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), etc.
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic is characterized by comprising the following steps of:
step S1, constructing a coverage scene graph for autonomously collecting natural disaster area data by a plurality of unmanned aerial vehicles;
s2, constructing a multi-unmanned aerial vehicle path pheromone map based on the pheromone aggregation model, and marking the multi-unmanned aerial vehicle path;
step S3, defining a state function, an action function and a reward function according to the multi-unmanned aerial vehicle autonomous coverage task, establishing a multi-unmanned aerial vehicle autonomous coverage model based on a neural network and training;
and S4, performing multi-unmanned aerial vehicle autonomous coverage by adopting a trained multi-unmanned aerial vehicle autonomous coverage model.
2. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic according to claim 1, wherein the coverage scene graph for the multi-unmanned aerial vehicle to autonomously collect natural disaster area data in the step S1 comprises unmanned aerial vehicles, coverage targets and barrier positions, specifically:
defining n=1, 2, …, N denotes the number of each unmanned aerial vehicle, N is the total number of unmanned aerial vehicles, and the set of all unmanned aerial vehicles is expressed as:
defining k=1, 2, …, K represents the number of each disaster source in the task area, K is the total number of disaster sources, and the set of all disaster sources in the natural disaster area is represented as:
the coverage area is defined as a two-dimensional bounded area of length L and width W, the two-dimensional area being divided into cells, and the area containing randomly generated obstacles.
3. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics according to claim 1, wherein the step S2 comprises the following sub-steps:
step S21, initializing a pheromone map;
step S22, an aggregation process of the pheromone map.
4. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics of claim 3, wherein the pheromone initialization function of the starting position of the unmanned aerial vehicle is:
P e,0 =P ((x,y), (1)
wherein (x, y) represents the starting position of the unmanned aerial vehicle, P e,0 Unmanned aerial vehicle initialization for representing task areaPheromone concentration of (2).
5. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics of claim 4, wherein the aggregation process of the pheromone map is specifically as follows:
when the drone moves to cell i, the total amount of pheromones at this point is expressed as:
wherein ,the amount of pheromone released at the cell i for the kth unmanned aerial vehicle; n represents the time (t-1, t), and at the cell i, the radius is the total number of unmanned aerial vehicles in the unmanned aerial vehicle perception range r;
the trace pheromone of the ith cell at time (t, t+1) is expressed as:
wherein, eta is the pheromone evaporation coefficient, eta is E (0, 1);
the total amount of pheromones of the ith cell at time t+1 is expressed as:
wherein eta is the pheromone evaporation coefficient, eta epsilon (0, 1),the value of the pheromone of the cell i at the time t+1 is increased;
the value of trace pheromone is limited to [ P ] min ,P max ]In, local pheromone updating rule and global letterThe rules for updating the pheromone are integrated as follows:
6. the autonomous coverage method of multiple unmanned aerial vehicles based on pheromone heuristic of claim 1, wherein the state function, the action function and the reward function in the step S3 are respectively:
status function: trace pheromone concentration containing task area cellsN th UAV remaining energy consumption +.>Cover status of disaster source +.>At time t, the state function is expressed as:
action function: including direction of flight of unmanned aerial vehicleAnd flight distance of unmanned aerial vehicle->At time t, the action function is expressed as:
bonus function:rewards comprising unmanned aerial vehicle coverage disaster source acquisitionRewarding of impact obstacle>Rewards overlapping a certain area ++>At time t, the bonus function is expressed as:
7. the multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic of claim 6, wherein the multi-unmanned aerial vehicle autonomous coverage model based on the neural network in the step S3 is specifically: the neural network comprises an action network and an evaluation network, and specifically comprises the following steps: the input of the action network is the state function of each unmanned aerial vehicleThe output is the action function of each unmanned aerial vehicle>The inputs of the evaluation network are the joint state s (t) and the joint action a (t), the output is the Q value, and the Q value is recorded as Q (s, a).
8. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics of claim 7, wherein the training process of the multi-unmanned aerial vehicle autonomous coverage model based on the neural network is specifically as follows:
1) Random initialization evaluation network Q (s, a|θ Q ) And action network pi (s|theta π ) And corresponding weight parameter θ Q and θπ
2) Playback of buffers from experienceA batch of samples are randomly extracted, and the number of samples is +.>One of them is(s) j ,a j ,r j ,s ′j ) Where j is the sample index;
3) Using minimization of loss functionsUpdating the unmanned aerial vehicle evaluation network:
wherein ,yj Is calculated by the Target Q network of the j-th unmanned aerial vehicle,representing action a taken using policy pi under the joint state function s 1 ,…,a N ) Is a desired discount prize for (1);
4) Updating target network parameters corresponding to the action network and the evaluation network:
wherein the parameter tau represents the learning rate,is a target network parameter.
9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, characterized in that the processor, when executing the program, implements the method according to any of claims 1-8.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-8.
CN202310623947.7A 2023-05-30 2023-05-30 Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic Pending CN116643587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310623947.7A CN116643587A (en) 2023-05-30 2023-05-30 Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310623947.7A CN116643587A (en) 2023-05-30 2023-05-30 Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic

Publications (1)

Publication Number Publication Date
CN116643587A true CN116643587A (en) 2023-08-25

Family

ID=87624247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310623947.7A Pending CN116643587A (en) 2023-05-30 2023-05-30 Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic

Country Status (1)

Country Link
CN (1) CN116643587A (en)

Similar Documents

Publication Publication Date Title
CN112256056B (en) Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning
CN110520868B (en) Method, program product and storage medium for distributed reinforcement learning
CN110587606B (en) Open scene-oriented multi-robot autonomous collaborative search and rescue method
CN111417964A (en) Distributed training using heterogeneous actor-evaluator reinforcement learning
US11992944B2 (en) Data-efficient hierarchical reinforcement learning
CN112711271B (en) Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning
CN112580795A (en) Neural network acquisition method and related equipment
CN113253733B (en) Navigation obstacle avoidance method, device and system based on learning and fusion
US20230052561A1 (en) Using a recursive reinforcement model to determine an agent action
KR102303126B1 (en) Method and system for optimizing reinforcement learning based navigation to human preference
Ntakolia et al. Autonomous path planning with obstacle avoidance for smart assistive systems
CN113759901A (en) Mobile robot autonomous obstacle avoidance method based on deep reinforcement learning
CN115648204A (en) Training method, device, equipment and storage medium of intelligent decision model
Naveed et al. Deep introspective SLAM: Deep reinforcement learning based approach to avoid tracking failure in visual SLAM
Long et al. A multi-subpopulation bacterial foraging optimisation algorithm with deletion and immigration strategies for unmanned surface vehicle path planning
Peake et al. Wilderness search and rescue missions using deep reinforcement learning
Li et al. Research on global path planning of unmanned vehicles based on improved ant colony algorithm in the complex road environment
CN115265547A (en) Robot active navigation method based on reinforcement learning in unknown environment
US20220269948A1 (en) Training of a convolutional neural network
CN116643587A (en) Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic
CN112987713A (en) Control method and device for automatic driving equipment and storage medium
Saito et al. A movement adjustment method for DQN-based autonomous aerial vehicle
US20220107628A1 (en) Systems and methods for distributed hierarchical control in multi-agent adversarial environments
CN113959446B (en) Autonomous logistics transportation navigation method for robot based on neural network
CN115097861A (en) Multi-Unmanned Aerial Vehicle (UAV) capture strategy method based on CEL-MADDPG

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination