CN116643587A

CN116643587A - Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic

Info

Publication number: CN116643587A
Application number: CN202310623947.7A
Authority: CN
Inventors: 何斌; 程徐; 蒋荣; 李刚; 程斌; 陆萍; 张朋朋
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-08-25

Abstract

The invention relates to a multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic, which comprises the following steps: step S1, constructing a coverage scene graph for autonomously collecting natural disaster area data by a plurality of unmanned aerial vehicles; s2, constructing a multi-unmanned aerial vehicle path pheromone map based on the pheromone aggregation model, and marking the multi-unmanned aerial vehicle path; step S3, defining a state function, an action function and a reward function according to the multi-unmanned aerial vehicle autonomous coverage task, establishing a multi-unmanned aerial vehicle autonomous coverage model based on a neural network and training; and S4, performing multi-unmanned aerial vehicle autonomous coverage by adopting a trained multi-unmanned aerial vehicle autonomous coverage model. Compared with the prior art, the invention has the advantages of high efficiency, energy conservation, high coverage rate and wide applicability.

Description

Multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic

Technical Field

The invention relates to the technical field of unmanned aerial vehicle autonomous control, in particular to a multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic.

Background

Once the static sensor nodes are deployed in the traditional static area coverage method, if coverage holes or faults occur in the sensor nodes, the sensor nodes are difficult to redeploy. In contrast, dynamic area coverage is more flexible and therefore is widely used in the fields of target detection, structural health monitoring, and disaster relief.

In the dynamic region coverage process, due to operability, flexibility and robustness of the unmanned aerial vehicle, the performance of the mobile sensor network can be greatly improved by dynamically adjusting the position according to environmental changes, the unmanned aerial vehicle is controlled by a dynamic coverage algorithm to complete the collection work of the Internet of things equipment or the wireless sensor data, and the unmanned aerial vehicle can accurately cover the target region. On the other hand, due to the cost advantage and the rapid deployment capability of the unmanned aerial vehicle wireless communication system, the unmanned aerial vehicle wireless communication system is more suitable for emergency communication recovery when communication infrastructure is damaged due to emergencies or duration is limited, such as natural disasters, so as to enhance the coverage range and performance of the communication network in emergency communication.

Control algorithms for unmanned aerial vehicle dynamic area coverage are generally divided into two categories, non-self-organizing control algorithms and self-organizing control algorithms. The non-self-organizing dynamic coverage algorithm reduces repeated coverage of the target area through bit-type space modeling optimization so as to achieve efficient coverage. However, this non-self-organizing control method is less robust and in emergency situations, unmanned aerial vehicles malfunction or are too far apart, resulting in coverage holes. The self-organizing control algorithm enables the unmanned aerial vehicle to have stronger robustness and good autonomy, and the problem of uneven coverage of a target area under emergency can be solved through local information interaction. Intelligent deep reinforcement learning is considered as a key technology for unmanned aerial vehicle autonomous control, as they can handle complex state spaces and environments well. However, conventional RL methods, such as Q-Learning or policy gradient methods, are not suitable for use in a dynamic overlay environment for multiple drones. Because the strategy of each unmanned aerial vehicle is continuously changed in the training process, the unmanned aerial vehicle is always in a dynamic environment, and the strategy of the unmanned aerial vehicle is difficult to converge.

This task is quite challenging in order to provide long-term effective dynamic area coverage. Because UAVs are typically small to medium battery-operated devices and have limited communication range, limited time of flight and payload capacity, the dynamic coverage process of the drone needs to operate in an energy-efficient manner for purposes of extending the network lifecycle. In addition, unmanned aerial vehicle works in dynamic, open and unpredictable environment, realizes resource sharing and can effectively reduce unmanned aerial vehicle's repeated cover and reduce unmanned aerial vehicle's unnecessary energy consumption.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a multi-unmanned-plane autonomous coverage method based on pheromone inspiring, which has the advantages of high efficiency, energy conservation, high coverage rate, wide applicability and the like.

The aim of the invention can be achieved by the following technical scheme:

according to a first aspect of the present invention, there is provided a multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics, the method comprising the steps of:

step S1, constructing a coverage scene graph for autonomously collecting natural disaster area data by a plurality of unmanned aerial vehicles;

s2, constructing a multi-unmanned aerial vehicle path pheromone map based on the pheromone aggregation model, and marking the multi-unmanned aerial vehicle path;

step S3, defining a state function, an action function and a reward function according to the multi-unmanned aerial vehicle autonomous coverage task, establishing a multi-unmanned aerial vehicle autonomous coverage model based on a neural network and training;

and S4, performing multi-unmanned aerial vehicle autonomous coverage by adopting a trained multi-unmanned aerial vehicle autonomous coverage model.

Preferably, the coverage scene graph in step S1 in which the multiple unmanned aerial vehicles autonomously collect natural disaster area data includes unmanned aerial vehicles, coverage targets and obstacle positions, specifically:

defining n=1, 2, …, N denotes the number of each unmanned aerial vehicle, N is the total number of unmanned aerial vehicles, and the set of all unmanned aerial vehicles is expressed as:

defining k=1, 2, …, K represents the number of each disaster source in the task area, K is the total number of disaster sources, and the set of all disaster sources in the natural disaster area is represented as:

the coverage area is defined as a two-dimensional bounded area of length L and width W, the two-dimensional area being divided into cells, and the area containing randomly generated obstacles.

Preferably, said step S2 comprises the following sub-steps:

step S21, initializing a pheromone map;

step S22, an aggregation process of the pheromone map.

Preferably, the pheromone initialization function of the starting position of the unmanned aerial vehicle is:

P _e,0 ＝P _((x,y), (1)

wherein (x, y) represents the starting position of the unmanned aerial vehicle, P _e,0 And the pheromone concentration of the unmanned aerial vehicle initialization in the task area is represented.

Preferably, the aggregation process of the pheromone map specifically comprises the following steps:

when the drone moves to cell i, the total amount of pheromones at this point is expressed as:

wherein ,the amount of pheromone released at the cell i for the kth unmanned aerial vehicle; n representsAt the time (t-1, t), at the cell i, the radius is the total number of unmanned aerial vehicles within the unmanned aerial vehicle perception range r;

the trace pheromone of the ith cell at time (t, t+1) is expressed as:

wherein, eta is the pheromone evaporation coefficient, eta is E (0, 1);

the total amount of pheromones of the ith cell at time t+1 is expressed as:

wherein eta is the pheromone evaporation coefficient, eta epsilon (0, 1),the value of the pheromone of the cell i at the time t+1 is increased;

the value of trace pheromone is limited to [ P ] _min ,P _max ]In this case, the local pheromone update rule and the global pheromone update rule are integrated as follows:

preferably, in the step S3, the status function, the action function, and the reward function are respectively:

status function: trace pheromone concentration containing task area cellsN th UAV remaining energy consumption +.>Cover status of disaster source +.>At time t, the state function is expressed as:

action function: including direction of flight of unmanned aerial vehicleAnd flight distance of unmanned aerial vehicle->At time t, the action function is expressed as:

bonus function: rewards comprising unmanned aerial vehicle coverage disaster source acquisitionRewarding of impact obstacle>Rewards overlapping a certain area ++>At time t, the bonus function is expressed as:

preferably, in the step S3, the autonomous coverage model of multiple unmanned aerial vehicles based on the neural network is specifically: the neural network comprises an action network and an evaluation network, and specifically comprises the following steps: the input of the action network is the state function of each unmanned aerial vehicleOutput is perAction function of unmanned aerial vehicle>The inputs of the evaluation network are the joint state s (t) and the joint action a (t), the output is the Q value, and the Q value is recorded as Q (s, a).

Preferably, the training process of the multi-unmanned aerial vehicle autonomous coverage model based on the neural network specifically comprises the following steps:

1) Random initialization evaluation network Q (s, a|θ ^Q ) And action network pi (s|theta ^π ) And corresponding weight parameter θ ^Q and θ^π ；

2) Playback of buffers from experienceA batch of samples are randomly extracted, and the number of samples is +.>One of them is(s) ^j ,a ^j ,r ^j ,s ^′j ) Where j is the sample index;

3) Using minimization of loss functionsUpdating the unmanned aerial vehicle evaluation network:

wherein ,y^j Is calculated by the Target Q network of the j-th unmanned aerial vehicle,representing action a taken using policy pi under the joint state function s ₁ ,…,a _N ) Is a desired discount prize for (1);

4) Updating target network parameters corresponding to the action network and the evaluation network:

wherein the parameter tau represents the learning rate,is a target network parameter.

According to a second aspect of the present invention there is provided an electronic device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method of any one of the above when executing the program.

According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any one of the above.

Compared with the prior art, the invention has the following advantages:

1) According to the invention, the multi-unmanned aerial vehicle path pheromone map is utilized, so that overlapping coverage of a task area in a short time is effectively avoided, and energy consumption of unmanned aerial vehicles is minimized;

2) The method is based on the rewarding function designed by the autonomous coverage task of the multiple unmanned aerial vehicles, effectively guides the multiple unmanned aerial vehicles to cooperatively complete the data collection problem in the natural disaster area, and solves the problem of dynamic instability in the reinforcement learning of the multiple unmanned aerial vehicles;

3) The multi-unmanned aerial vehicle autonomous coverage method provided by the invention has the advantages of high efficiency, energy conservation, high coverage rate, wide applicability and the like, and further improves the disaster rescue efficiency of the multi-unmanned aerial vehicle.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a scene graph of the invention for autonomous collection of data by multiple unmanned aerial vehicles in a natural disaster area;

FIG. 3 is a schematic diagram of a pheromone map of the present invention;

FIG. 4 is a schematic diagram of a network architecture according to the present invention;

FIG. 5 is a schematic diagram of an Actor network in a natural disaster application scenario according to the present invention;

fig. 6 is a schematic diagram of Critic network in a natural disaster application scenario according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

Examples

As shown in fig. 1, in this embodiment, a multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic by taking natural disaster area data collected by a multi-unmanned aerial vehicle as a research background, the method includes the following steps:

step S1, establishing an autonomous coverage schematic diagram of a plurality of unmanned aerial vehicles, wherein the autonomous coverage schematic diagram comprises coverage targets and barrier positions, and further constructing a coverage scene diagram for autonomously collecting natural disaster area data of the plurality of unmanned aerial vehicles, as shown in FIG. 2, specifically comprising:

1) Defining n=1, 2, …, N denotes the number of each unmanned aerial vehicle, N is the total number of unmanned aerial vehicles, the set of all unmanned aerial vehicles can be expressed as:

2) Defining k=1, 2, …, K represents the number of each disaster source in the task area, K is the total number of disaster sources, and then the set of all disaster sources in the natural disaster area can be expressed as:

3) The coverage area is defined as a two-dimensional bounded area of length L and width W, the two-dimensional area being divided into smaller cells and the area containing a number of obstacles that are randomly generated.

Step S2, designing an aggregation model of pheromones based on a pheromone mechanism, constructing a multi-unmanned aerial vehicle path pheromone map, and marking a multi-unmanned aerial vehicle path, and specifically comprises the following sub-steps:

1) Initialization process of pheromone map:

ants use pheromones to guide foraging routes of other ants, trace pheromones are defined to record the driving path of the unmanned aerial vehicle, the mechanism can effectively avoid repeated access of the unmanned aerial vehicle in a short time, and compared with an exhaustive search algorithm, the mechanism can reduce the search difficulty and improve the global coverage rate.

The pheromone initialization function of the starting position of the unmanned aerial vehicle is as follows:

P _e,0 ＝P _((x,y),0) (1)

2) Aggregation process of pheromone map:

first, each unmanned aerial vehicle can sense trace pheromone concentration within a certain range.

Any cell filled with trace pheromones is considered as a repulsive force of the local environment, which has a repulsive effect on nearby unmanned aerial vehicles, thereby effectively observing the local state space. At this time, the drone needs to select the area with the lowest trace pheromone concentration to perform an action, which represents that the area is not accessed or is rarely accessed. When the drone moves to cell i, the total amount of pheromones at that location can be expressed as:

wherein ,the amount of pheromone released at the cell i for the kth unmanned aerial vehicle; n represents the time (t-1, t), and at cell i, the radius is the total number of unmanned aerial vehicles within the unmanned aerial vehicle perception range r.

The natural process of aggregation of trace pheromones then involves evaporation, which means that the precipitated pheromones will disappear over time, which also satisfies that the unmanned aerial vehicle will collect disaster source data again after a period of time. The amount of evaporation of the trace pheromone of the ith cell at time (t, t+1) can be expressed as:

wherein, eta is the pheromone evaporation coefficient and eta is (0, 1).

Finally, the total amount of pheromones of the ith cell at time t+1 can be expressed as:

wherein eta is the pheromone evaporation coefficient, eta epsilon (0, 1),the value of the pheromone of the cell i at the time t+1 is increased.

To avoid the algorithm falling into local optimum and to prevent the trace pheromone from being excessively large in negative value and fluctuation, the value of the trace pheromone is limited to [ P ] _min ,P _max ]Within the inner part. The search space is increased, the rapid convergence speed of the algorithm is ensured, and the local pheromone updating rule and the global pheromone updating rule are integrated as follows:

the state function, action function and rewarding function expressions are respectively:

(1) status function: trace pheromone concentration containing task area cellsN th UAV remaining energyConsumption of quantityCover status of disaster source +.>At time t, the state function may be expressed as:

wherein whenWhen, it indicates that disaster source k has been covered; if->Indicating that the disaster source is uncovered.

(2) Action function: including direction of flight of unmanned aerial vehicleAnd flight distance of unmanned aerial vehicle->

the action function expression at the time t is:

wherein ,representing the flight direction of the unmanned plane n; />Representing the flight distance of the unmanned aerial vehicle n. If->Indicating that the unmanned aerial vehicle is in a hovering state; if->Indicating that the drone has flown the maximum distance under the energy allowed.

(3) Bonus function: rewards comprising unmanned aerial vehicle coverage disaster source acquisitionRewarding of impact obstacle>Rewards overlapping a certain area ++>Three parts.

Wherein, when the disaster source is within the coverage range r of the unmanned aerial vehicle, the nth UAV obtains rewardsV _k The value of k disaster sources in the coverage range r of the unmanned plane is represented;

when the unmanned aerial vehicle hits an obstacle or boundary, the nth UAV is rewardedR is the total number of times an obstacle or boundary is struck.

When the unmanned aerial vehicle selects the direction, if the area with the highest pheromone is selected, the area is represented to be visited by a plurality of unmanned aerial vehicles, and the nth UAV is rewarded

Thus, at time t, the bonus function can be expressed as:

the method comprises the steps of (1) neural network and updating parameters to obtain a multi-unmanned aerial vehicle autonomous coverage model based on the neural network, wherein the model comprises the following specific steps:

as shown in FIG. 4, the network structure comprises two parts, namely an Actor network and a Critic network. As shown in fig. 5, the input to the Actor network is a status function for each droneThe output is the action function of each unmanned aerial vehicle>As shown in fig. 6, the input of the Critic network is the joint state s (t) and the joint action a (t), and the output is the Q value, denoted as Q (s, a0.

The training process of the neural network specifically comprises the following steps:

1) Random initialization of Critic network Q (s, a|q ^Q ) And an Actor network pi (s|theta ^π ) Weight parameter θ ^Q and θ^π Simultaneously initializing an empirical playback buffer

2) Calculating the state s (t), the action a (t) and the reward function r (t) of the N unmanned aerial vehicle at the moment t, and calculating the state s at the next moment ^′ And store it in an experience playback buffer

3) Playback of buffers from experienceA batch of samples are randomly extracted, and the number of samples is +.>One of them is(s) ^j ,a ^j ,r ^j ,s ^′j ) Where j is the sample index;

4) With minimized lossesLoss functionUpdating the Critic network of the unmanned aerial vehicle:

wherein y^j Is calculated by the Target Q network of the j-th unmanned aerial vehicle,representing action a taken using policy pi under the joint state function s ₁ ,…,a _N ) Is a desired discount prize for (1).

5) Updating target network parameters corresponding to the initiator network and the Critic network:

wherein the parameter tau represents the learning rate,representing Target policy network parameters, +.>Representing Target policy network parameters.

The electronic device of the present invention includes a Central Processing Unit (CPU) that can perform various appropriate actions and processes according to computer program instructions stored in a Read Only Memory (ROM) or computer program instructions loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device can also be stored. The CPU, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

A plurality of components in a device are connected to an I/O interface, comprising: an input unit such as a keyboard, a mouse, etc.; an output unit such as various types of displays, speakers, and the like; a storage unit such as a magnetic disk, an optical disk, or the like; and communication units such as network cards, modems, wireless communication transceivers, and the like. The communication unit allows the device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processing unit performs the respective methods and processes described above, for example, the methods S1 to S4. For example, in some embodiments, methods S1-S4 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or the communication unit. When the computer program is loaded into RAM and executed by the CPU, one or more steps of the methods S1 to S4 described above may be performed. Alternatively, in other embodiments, the CPU may be configured to perform methods S1-S4 by any other suitable means (e.g., by means of firmware).

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), etc.

Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic is characterized by comprising the following steps of:

2. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic according to claim 1, wherein the coverage scene graph for the multi-unmanned aerial vehicle to autonomously collect natural disaster area data in the step S1 comprises unmanned aerial vehicles, coverage targets and barrier positions, specifically:

3. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics according to claim 1, wherein the step S2 comprises the following sub-steps:

step S21, initializing a pheromone map;

step S22, an aggregation process of the pheromone map.

4. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics of claim 3, wherein the pheromone initialization function of the starting position of the unmanned aerial vehicle is:

P _e,0 ＝P _((x,y), (1)

wherein (x, y) represents the starting position of the unmanned aerial vehicle, P _e,0 Unmanned aerial vehicle initialization for representing task areaPheromone concentration of (2).

5. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics of claim 4, wherein the aggregation process of the pheromone map is specifically as follows:

wherein ,the amount of pheromone released at the cell i for the kth unmanned aerial vehicle; n represents the time (t-1, t), and at the cell i, the radius is the total number of unmanned aerial vehicles in the unmanned aerial vehicle perception range r;

the trace pheromone of the ith cell at time (t, t+1) is expressed as:

wherein, eta is the pheromone evaporation coefficient, eta is E (0, 1);

the total amount of pheromones of the ith cell at time t+1 is expressed as:

the value of trace pheromone is limited to [ P ] _min ,P _max ]In, local pheromone updating rule and global letterThe rules for updating the pheromone are integrated as follows:

6. the autonomous coverage method of multiple unmanned aerial vehicles based on pheromone heuristic of claim 1, wherein the state function, the action function and the reward function in the step S3 are respectively:

bonus function:rewards comprising unmanned aerial vehicle coverage disaster source acquisitionRewarding of impact obstacle>Rewards overlapping a certain area ++>At time t, the bonus function is expressed as:

7. the multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristic of claim 6, wherein the multi-unmanned aerial vehicle autonomous coverage model based on the neural network in the step S3 is specifically: the neural network comprises an action network and an evaluation network, and specifically comprises the following steps: the input of the action network is the state function of each unmanned aerial vehicleThe output is the action function of each unmanned aerial vehicle>The inputs of the evaluation network are the joint state s (t) and the joint action a (t), the output is the Q value, and the Q value is recorded as Q (s, a).

8. The multi-unmanned aerial vehicle autonomous coverage method based on pheromone heuristics of claim 7, wherein the training process of the multi-unmanned aerial vehicle autonomous coverage model based on the neural network is specifically as follows:

9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, characterized in that the processor, when executing the program, implements the method according to any of claims 1-8.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-8.