CN116205390A - Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning - Google Patents

Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning Download PDF

Info

Publication number
CN116205390A
CN116205390A CN202310156117.8A CN202310156117A CN116205390A CN 116205390 A CN116205390 A CN 116205390A CN 202310156117 A CN202310156117 A CN 202310156117A CN 116205390 A CN116205390 A CN 116205390A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
data collection
data
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310156117.8A
Other languages
Chinese (zh)
Inventor
胡钰林
高云飞
黄雨茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202310156117.8A priority Critical patent/CN116205390A/en
Publication of CN116205390A publication Critical patent/CN116205390A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention provides a multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning, comprising the following steps: step S1: the ground sensor equipment collects nearby data information, wherein the data information is collected by sensing information around the ground sensor equipment; step S2: the ground center dispatches unmanned aerial vehicles according to the number of dispatchable unmanned aerial vehicles to collect data of ground sensor equipment; step S3: the unmanned aerial vehicle performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system while collecting the data of all the ground sensing devices; step S4: judging whether the maximum training times are reached; step S5: outputting the optimal track and resource allocation condition of each unmanned aerial vehicle. The invention effectively saves the energy consumption required by the task completion of the whole system by optimizing the track of the unmanned aerial vehicle and the communication resource of the whole system, and has the characteristics of low cost, high speed, low delay and energy conservation.

Description

Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning
Technical Field
The application relates to the field of wireless communication, in particular to a multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning.
Background
In recent years, with the rapid development of internet of things technology, a large number of sensor nodes are deployed at some key nodes to monitor the environment. Maintaining a very large scale network is very challenging on the one hand, because the sensor nodes are very energy consuming to operate, and frequent replacement or charging of the sensor nodes can result in high costs. A presently preferred solution is to use a backscatter communication technique that allows the sensor node to transmit data by reflecting radio frequency signals. Although the transmission power consumption is low, the transmission cannot be carried out in a long distance, which makes the data collection of the system very difficult. On the other hand, in the process of collecting or processing data, the security performance of the data is difficult to ensure, and the data can be stolen by lawless persons, so that the problems of data leakage and the like are caused. Therefore, how to ensure the long-distance and rapid transmission of data and ensure the security performance of the data is a problem that needs to be solved urgently.
Disclosure of Invention
The invention provides a multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning. Firstly, the ground sensing equipment collects data which needs to be transmitted, and then the ground center dispatches the unmanned aerial vehicle to collect the data collected by the sensors in the sensor area. When the unmanned aerial vehicle collects data sent by the ground sensor, the track of the unmanned aerial vehicle needs to be optimized, and communication resources are reasonably distributed, so that the energy consumption of the whole system is saved.
In order to solve the technical problems, the invention adopts the following technical scheme:
a multi-unmanned aerial vehicle data collection method based on federal reinforcement learning comprises the following steps:
step S101: the ground sensor equipment collects nearby data information, and the data information is collected by sensing surrounding information through the ground sensor equipment;
step S102: the ground center dispatches unmanned aerial vehicles according to the number of dispatchable unmanned aerial vehicles to collect data of ground sensor equipment;
step S103: the unmanned aerial vehicle performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system while collecting the data of all the ground sensing devices;
step S104: judging whether the maximum training times are reached;
step S105: outputting the optimal track and resource allocation condition of each unmanned aerial vehicle.
Further, in step S2, each unmanned plane senses a nearby sensor node, and selects a sensor node with the largest rate as a collection target by using a greedy algorithm, so as to ensure the integrity of data, wherein the unmanned plane can collect the data of the next sensor node only after completing the data collection of the sensor node.
Further, in the step S3, the energy consumption of the whole system is optimized using a Federated Learning dueling DDQN algorithm based on the federal reinforcement learning method.
Further, in the step S3, the total energy consumed by the unmanned aerial vehicle to complete the total data collection task may be expressed as:
E=E * +E com
wherein E is * The energy consumption for the whole system comprises the total onboard energy consumed by all unmanned aerial vehicles in the total time slot N of time, E com The unmanned aerial vehicle is used for communicating energy in the data collection process.
Further, the total onboard energy E consumed by all unmanned aerial vehicles at the time total time slot N * Can be expressed as:
Figure BDA0004092508130000021
wherein P is m,n And (V) ensuring the propelling energy consumption of the flight of the unmanned aerial vehicle M in a time slot N, wherein M is the number of unmanned aerial vehicles, and the total data collection time slot of the unmanned aerial vehicle is N.
Further, in time slot n, the thrust energy consumption to ensure the flight of drone m may be expressed as:
Figure BDA0004092508130000031
wherein P is 0 And P i Is two constants, respectively represents the blade profile power and the induced power under the hovering state of the unmanned aerial vehicle, V is the flying speed of the unmanned aerial vehicle, U tip For tip speed of rotor blade, v 0 Is the average rotor induced speed in hovering state, d 0 The fuselage resistance ratio, s, rotor solidity, ρ, air density, and a rotor disk area are expressed.
Further, the consumption of communication energy by the drone in performing the data collection process may be expressed as:
Figure BDA0004092508130000032
wherein p is l,n For the transmitting power of the first ground sensor node in the time slot n, L is the number of the ground sensor nodes, and the total transmitting power of the ground sensor equipment is P, namely 0<p l,n ≤P。
Further, the track optimization of the unmanned aerial vehicle and the communication resource of the whole system are reasonably optimized to minimize the energy consumption E of the whole system, and the specific flow is as follows:
step 1, each unmanned aerial vehicle perceives nearby sensor nodes, a greedy algorithm is used for selecting the sensor node with the largest speed as a collection target, and the unmanned aerial vehicle can collect the data of the next sensor node only after the data collection of the sensor node is completed;
step 2. The established problem is converted into a Markov decision problem, which is then solved by using a federal reinforcement learning method, wherein a complete Markov decision process can be composed of four parts, namely<S,A,γ,r n >Wherein S is a state space, A is an action space, and gamma is a state transition when the unmanned aerial vehicle executes a taskShift probability, r n A reward function when the unmanned aerial vehicle executes the patrol task;
step 3, judging whether the unmanned aerial vehicle completes all data collection tasks, if not, executing the step 1, and if so, ending all data collection tasks;
and 4, judging whether the maximum iteration times are reached, if not, repeating the steps 1-3 until the algorithm reaches the maximum iteration times, and if so, outputting an optimal track and a resource allocation result and ending the program.
Further, the process of optimizing the trajectory of the unmanned aerial vehicle and the communication resources of the system by using the federal reinforcement learning method is as follows:
a. assume that the state of unmanned plane m in time slot n is S n ={q m,nl,n Action a taken from action space a }, it n ={o m,nl,n Transition to the next state S n+1 ={q m,n+1l,n+1 And obtains a prize, and then transfers the result of the state (S n ,A n ,r n ,S n+1 ) Saving in an experience pool; wherein q is m,n Representing the position coordinates of the drone m at slot n,
Figure BDA0004092508130000041
represents a power discrete matrix, o m,n = { n, S, e, w } represents the flying direction of the unmanned aerial vehicle, n, S, e, w represents north, south, east, west, S, respectively n+1 The state of the unmanned plane m in the time slot n+1;
b. randomly selecting N from a pool of experiences 1 Step samples and using gradient descent method to reduce the loss function of the neural network to optimize the trajectory and communication resource allocation of the drone to obtain greater rewards, wherein the loss function is defined as
Figure BDA0004092508130000042
Wherein r is n+1 Representing the prize earned by the unmanned aerial vehicle in the n+1 time slot, lambda representing the discount factor, theta * Affecting nerves with thetaFactors, Q (S) n ,A n I theta) indicates that the unmanned aerial vehicle is in the current state S in the current network n Take action A n Is used for the control of the temperature of the liquid crystal display,
Figure BDA0004092508130000043
indicating that the drone is in the current state S in the target network n+1 Take action->
Figure BDA0004092508130000044
Q value of (2);
c. the unmanned aerial vehicle sends the trained model parameters to the aggregation end, and then the aggregation end aggregates the model parameters and sends the model parameters to each unmanned aerial vehicle, wherein the aggregation end can be served by a certain unmanned aerial vehicle for executing tasks, the energy consumed in the process of exchanging the model parameters can be ignored, and the model parameters of the unmanned aerial vehicle m in the time slot n are assumed to be w m,n The unmanned aerial vehicle aggregation end obtains a neural network model parameter w of the training of the next time slot of all unmanned aerial vehicles through aggregation weighting processing n+1 And w is communicated in the time slot n+1 through downlink n+1 Is transmitted to each unmanned aerial vehicle,
wherein w is n+1 The concrete steps are as follows:
Figure BDA0004092508130000045
wherein the method comprises the steps of
Figure BDA0004092508130000051
For the number and the size of model parameters of all unmanned aerial vehicles, v represents the number and the size of model parameters of unmanned aerial vehicle m;
d. and c, judging whether the unmanned aerial vehicle completes the data collection task of the sensing equipment, if not, executing the step a by the unmanned aerial vehicle, and if so, completing the data collection of the sensor node.
The invention also provides a multi-unmanned aerial vehicle data collection method based on federal reinforcement learning, which comprises a sensor data collection module, an unmanned aerial vehicle track optimization and resource allocation module and a result output module, wherein,
the sensor data collection module senses and collects data around the ground sensor equipment;
the unmanned aerial vehicle data collection module dispatches an unmanned aerial vehicle to collect data of the ground sensor equipment;
the unmanned aerial vehicle track optimization and resource allocation module performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system;
and the result output module outputs the optimal track and the resource allocation result of the optimized unmanned aerial vehicle inspection.
Compared with the prior art, the invention at least comprises the following beneficial effects:
1. by utilizing the flexibility of the unmanned aerial vehicle and the characteristic of high speed of air-ground communication transmission, the energy consumption required by the task completion of the whole system is effectively saved by optimizing the track of the unmanned aerial vehicle and the communication resource of the whole system, and the unmanned aerial vehicle has the characteristics of low cost, high speed, low delay and energy conservation.
2. The method for learning by using the distributed federation enables the multi-unmanned aerial vehicle to share model parameters during training when performing data collection tasks, solves the problem of data leakage when the unmanned aerial vehicle collects data, and accelerates the convergence performance of an algorithm compared with a centralized multi-unmanned aerial vehicle data collection solution.
3. When the unmanned aerial vehicles cooperate with data collection, the problems of safe flight and flying-out boundary between the unmanned aerial vehicles are considered, so that the unmanned aerial vehicles are more close to a real data collection scene when the data collection task is executed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system according to the present invention;
FIG. 2 is a general flow chart of the present invention;
FIG. 3 is a flow chart of unmanned aerial vehicle trajectory optimization and resource allocation based on the federal reinforcement learning algorithm;
FIG. 4 is a flow of greedy strategy selection data collection points of the present invention;
fig. 5 is a training flow of unmanned plane trajectory optimization and resource allocation according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the experimental methods described in the following embodiments, unless otherwise specified, are all conventional methods, and the reagents and materials, unless otherwise specified, are all commercially available; in the description of the present invention, the terms "transverse", "longitudinal", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus are not to be construed as limiting the present invention.
Furthermore, the terms "horizontal," "vertical," "overhang," and the like do not denote a requirement that the component be absolutely horizontal or overhang, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.
Aiming at the problems that the sensor nodes are difficult to charge, long-distance communication transmission is difficult to carry out, and data leakage is easy to occur in the transmission process of data, the invention provides an unmanned aerial vehicle data collection method and system based on federal reinforcement learning, which not only solve the problems that ground sensor equipment cannot be transmitted in a long distance and the electric quantity of the sensor equipment is limited; the problem of data leakage in the data transmission process is also solved. The method has the advantages of energy saving, safety and high-speed transmission.
As shown in fig. 2, the embodiment provides a multi-unmanned aerial vehicle data collection method based on federal reinforcement learning, which includes the following steps:
step S101: the ground sensor equipment collects nearby data information, and the data information is collected by sensing surrounding information through the ground sensor equipment;
step S102: the ground center dispatches unmanned aerial vehicles according to the number of dispatchable unmanned aerial vehicles to collect data of ground sensor equipment;
step S103: the unmanned aerial vehicle performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system while collecting the data of all the ground sensing devices;
step S104: judging whether the maximum training times are reached;
step S105: outputting the optimal track and resource allocation condition of each unmanned aerial vehicle.
The system is assumed to comprise M unmanned aerial vehicles and L ground sensor nodes, wherein the L ground sensor nodes are distributed in a K multiplied by KKm area, and the total data collection time slot of the unmanned aerial vehicles is assumed to be N; the position coordinate of the mth unmanned aerial vehicle in the time slot n isq m,n =[x m,n ,y m,n ,z m,n ]Wherein x is m,n For the abscissa of unmanned plane m in time slot n, y m,n For unmanned plane m on the ordinate of time slot n, z m,n The horizontal height of the unmanned plane m from the ground in the time slot n is set; the position coordinate of the first ground sensor node is w l =[x l ,y l ,0]The total transmission power of the ground sensing equipment is P, namely 0<p l,n P is less than or equal to P l,n The transmit power at time slot n for the first ground sensor node.
In step S102, each unmanned aerial vehicle senses a nearby sensor node, and selects a sensor node with the highest speed as a collection target by using a greedy algorithm, so as to ensure the integrity of data, wherein the unmanned aerial vehicle can collect the data of the next sensor node only after completing the data collection of the sensor node.
In step S103, the energy consumption of the entire system is optimized using Federated Learning dueling DDQN algorithm based on the federal reinforcement learning method.
In step S103, the energy consumption of the whole system is composed of two parts, one part is the propulsion energy consumption for ensuring the unmanned aerial vehicle to fly, and the other part is the communication energy consumed by the unmanned aerial vehicle in the process of collecting data, wherein in the time slot n, the propulsion energy consumption for ensuring the unmanned aerial vehicle to fly can be expressed as
Figure BDA0004092508130000081
Wherein P is 0 And P i Is two constants, respectively represents the blade profile power and the induced power under the hovering state of the unmanned aerial vehicle, V is the flying speed of the unmanned aerial vehicle, U tip For tip speed of rotor blade, v 0 Is the average rotor induced speed in hover. In addition, d 0 The fuselage resistance ratio, s, rotor solidity, ρ, air density, and a rotor disk area are expressed. Then the total airborne energy E consumed by all unmanned aerial vehicles in the total time slot N of time * Can be expressed as:
Figure BDA0004092508130000082
the consumption of communication energy by the drone in performing the data collection process may be expressed as
Figure BDA0004092508130000083
Thus, the total energy consumed by the drone to complete the overall data collection task can be expressed as:
E=E * +E com
the track optimization of the unmanned aerial vehicle and the communication resource of the whole system are reasonably optimized to minimize the energy consumption E of the whole system, and the specific flow is as follows:
step 1, each unmanned aerial vehicle perceives nearby sensor nodes, a greedy algorithm is used for selecting the sensor node with the largest speed as a collection target, and the unmanned aerial vehicle can collect the data of the next sensor node only after the data collection of the sensor node is completed;
step 2. The established problem is converted into a Markov decision problem, which is then solved by using a federal reinforcement learning method, wherein a complete Markov decision process can be composed of four parts, namely<S,A,γ,r n >Wherein S is a state space, A is an action space, gamma is a state transition probability when the unmanned aerial vehicle executes a task, and r n A reward function when the unmanned aerial vehicle executes the patrol task;
in this design the state space is s= { q m,nl,n }, where q m,n Representing the position coordinates of the drone m at slot n,
Figure BDA0004092508130000091
representing a power discrete matrix. The motion space is a= { o m,nl,n O, where o m,n = { n, s, e, w } represents the flight direction of the unmanned aerial vehicle, n, s, e, w represents north, south, east, west, respectively. State transitionThe probability gamma indicates that the time slot n is in the state S n The unmanned aerial vehicle executes the selected action a according to the action selection strategy n Transition to the next state S n+1 Is a probability of (2). The bonus function may be expressed as
Figure BDA0004092508130000092
Wherein a is a negative number which represents the penalty of the unmanned aerial vehicle for out-of-bounds or collision between unmanned aerial vehicles when executing tasks, r m,n For the data collection rate of unmanned plane m in time slot n, beta is a weight coefficient, which is a constant, wherein the data collection rate r of unmanned plane m in time slot n m,n Can be expressed as:
Figure BDA0004092508130000093
where B is the bandwidth of the system, h m,n Representing the channel gain of the system, i is the channel gain of the unmanned plane m between the time slot n and the ground sensing node l in the transmission process, and p l Transmit power, N, for user/uplink communication transmission 0 Is the noise power spectral density.
After the establishment of the Markov decision is completed, the track of the unmanned aerial vehicle and the communication resources of the system are reasonably optimized by using a federal reinforcement learning method, so that the energy consumption of the whole system is minimized.
Step 3, judging whether the unmanned aerial vehicle completes all data collection tasks, if not, executing the step 1, and if so, ending all data collection tasks;
step 4, judging whether the maximum iteration times are reached;
if not, repeating the step 1-3 until the algorithm reaches the maximum iteration times, if so, outputting the optimal track and the resource allocation result and ending the program.
The process of optimizing the track of the unmanned aerial vehicle and the communication resources of the system by using the federal reinforcement learning method is as follows:
a. suppose unmanned aerial vehiclem is S in the state of time slot n n ={q m,nl,n Action a taken from action space a }, it n ={o m,nl,n Transition to the next state S n+1 ={q m,n+1l,n+1 And obtains a prize, and then transfers the result of the state (S n ,A n ,r n ,S n+1 ) Saving in experience pool
b. Randomly selecting N from a pool of experiences 1 Step samples and using gradient descent method to reduce the loss function of the neural network to optimize the trajectory and communication resource allocation of the drone to obtain greater rewards, wherein the loss function is defined as
Figure BDA0004092508130000101
Wherein r is n+1 Representing the prize earned by the unmanned aerial vehicle in the n+1 time slot, lambda representing the discount factor, theta * And θ represents a factor affecting the neural network model parameters, Q (S n ,A n I theta) indicates that the unmanned aerial vehicle is in the current state S in the current network n Take action A n Is used for the control of the temperature of the liquid crystal display,
Figure BDA0004092508130000102
indicating that the drone is in the current state S in the target network n+1 Take action->
Figure BDA0004092508130000103
Q value of (2);
c. the unmanned aerial vehicle sends the trained model parameters to the aggregation end, and then the aggregation end aggregates the model parameters and sends the model parameters to each unmanned aerial vehicle, wherein the aggregation end can be served by a certain unmanned aerial vehicle for executing tasks, the energy consumed in the process of exchanging the model parameters can be ignored, and the model parameters of the unmanned aerial vehicle m in the time slot n are assumed to be w m,n The unmanned aerial vehicle aggregation end obtains a neural network model parameter w of the training of the next time slot of all unmanned aerial vehicles through aggregation weighting processing n+1 And w is communicated in the time slot n+1 through downlink n+1 Is transmitted to each unmanned aerial vehicle,
wherein w is n+1 The concrete steps are as follows:
Figure BDA0004092508130000104
wherein the method comprises the steps of
Figure BDA0004092508130000105
For the number and the size of model parameters of all unmanned aerial vehicles, v represents the number and the size of model parameters of unmanned aerial vehicle m;
d. and c, judging whether the unmanned aerial vehicle completes the data collection task of the sensing equipment, if not, executing the step a by the unmanned aerial vehicle, and if so, completing the data collection of the sensor node.
As shown in fig. 3, the steps for the unmanned aerial vehicle to begin collecting data volumes of the ground sensor nodes are:
step 201: the flow starts.
Step 202: each drone uses a greedy algorithm to determine its data collection sensor nodes.
Step 203: the drone obtains distances from other drones and boundaries at the current location.
Step 204: the unmanned aerial vehicle optimizes own track and resource allocation.
Step 205: and judging whether the unmanned aerial vehicle completes the data collection task of the sensor node.
Step 206: and judging whether the unmanned aerial vehicle completes the data collection task of all the sensor nodes.
Step 207: the flow ends.
In step 202, the unmanned aerial vehicle determines that the data collecting sensor node is as shown in fig. 4 by using a greedy algorithm, and the specific steps are as follows:
step 301: the flow starts.
Step 302: the sensor nodes are numbered.
Step 303: and calculating the data rates of the unmanned aerial vehicle and each sensor node, deleting the sequence numbers of the sensor nodes which are already finished, and sequencing the sensor nodes according to the rates from large to small.
Step 304: and judging whether the data collection of the sensor node at the maximum rate is finished, if yes, executing step 303, otherwise, executing step 305.
Step 305: the device number is output.
Step 306: the flow ends.
As shown in fig. 5, the step 204 of optimizing the self track and the resource allocation of the unmanned aerial vehicle is specifically as follows:
step 401: the unmanned aerial vehicle selects the current action according to the strategy selection mechanism, and perceives the distance of each unmanned aerial vehicle in the next time slot and the distance from the boundary.
Step 402: the unmanned aerial vehicle obtains rewards according to the state in the flight process.
Step 403: the state transition information is stored in an experience pool.
Step 404: the loss function is trained.
Step 405: and each unmanned aerial vehicle sends the model parameters to the aggregation end.
Step 406: the aggregation processing of the aggregation end transmits the processed model parameters to each unmanned aerial vehicle.
The embodiment also provides a multi-unmanned aerial vehicle data collection method based on federal reinforcement learning, which is characterized by comprising a sensor data collection module, an unmanned aerial vehicle track optimization and resource allocation module and a result output module, wherein,
the sensor data collection module senses and collects data around the ground sensor equipment;
the unmanned aerial vehicle data collection module dispatches an unmanned aerial vehicle to collect data of the ground sensor equipment;
the unmanned aerial vehicle track optimization and resource allocation module performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system;
and the result output module outputs the optimal track and the resource allocation result of the optimized unmanned aerial vehicle inspection.
In summary, the present invention has the following technical effects;
1. according to the problems of limited transmission distance of the surface sensing equipment and limited energy of the total system, long-distance data transmission and high-speed data transmission are realized by utilizing the maneuvering performance and high-probability LoS channel model of the unmanned aerial vehicle.
2. According to the unsafe problems such as data leakage and the like when the unmanned aerial vehicle completes data collection by the cooperative ground sensing equipment, the problem of protecting data privacy when the unmanned aerial vehicle cooperates with data collection is realized by utilizing a federal learning method, only model parameters are required to be uploaded, a large amount of data transmission is not required, and communication overhead is reduced.
3. The problem that multiple unmanned planes easily go out of bounds and collide when performing data collection is considered, and the method and the device are more suitable for a real data collection scene.
4. According to the total resources of the system and the limited energy problem, the track and communication resource allocation of the unmanned aerial vehicle are jointly optimized, so that the total energy consumption of the system is minimized on the premise of meeting all data collection requirements.
The above embodiments are merely illustrative of the technical solutions of the present invention. The method and apparatus according to the present invention are not limited to the description of the embodiments above, but rather the scope of the invention is defined by the claims. Any modifications, additions or equivalent substitutions made by those skilled in the art based on this embodiment are within the scope of the invention as claimed in the claims.

Claims (10)

1. A multi-unmanned aerial vehicle data collection method based on federal reinforcement learning is characterized by comprising the following steps:
step S101: the ground sensor equipment collects nearby data information, wherein the data information is collected by sensing information around the ground sensor equipment;
step S102: the ground center dispatches unmanned aerial vehicles according to the number of dispatchable unmanned aerial vehicles to collect data of ground sensor equipment;
step S103: the unmanned aerial vehicle performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system while collecting the data of all the ground sensing devices;
step S104: judging whether the maximum training times are reached;
step S105: outputting the optimal track and resource allocation condition of each unmanned aerial vehicle.
2. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 1, wherein in step S102, each unmanned aerial vehicle senses nearby sensor nodes, and selects a sensor node with the greatest rate as a collection target by using a greedy algorithm, so as to ensure the integrity of data, wherein the unmanned aerial vehicle can collect the data of the next sensor node only after completing the data collection of the sensor node.
3. The multi-unmanned aerial vehicle data collection method according to claim 1, wherein in step S103, the energy consumption of the whole system is optimized using Federated Learning dueling DDQN algorithm based on federal reinforcement learning method.
4. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 1, wherein in step S103, the total energy consumed by the unmanned aerial vehicle to complete the total data collection task can be expressed as:
E=E * +E com
wherein E is * The energy consumption for the whole system comprises the total onboard energy consumed by all unmanned aerial vehicles in the total time slot N of time, E com The unmanned aerial vehicle is used for communicating energy in the data collection process.
5. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 4, wherein the total onboard energy E consumed by all unmanned aerial vehicles in the total time slot N is the total time * Can be expressed as:
Figure FDA0004092508120000021
wherein P is m,n And (V) ensuring the propelling energy consumption of the flight of the unmanned aerial vehicle M in a time slot N, wherein M is the number of unmanned aerial vehicles, and the total data collection time slot of the unmanned aerial vehicle is N.
6. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 5, wherein in time slot n, the thrust energy consumption for ensuring the flight of unmanned aerial vehicle m can be expressed as:
Figure FDA0004092508120000022
wherein P is 0 And P i Is two constants, respectively represents the blade profile power and the induced power under the hovering state of the unmanned aerial vehicle, V is the flying speed of the unmanned aerial vehicle, U tip For tip speed of rotor blade, v 0 Is the average rotor induced speed in hovering state, d 0 The fuselage resistance ratio, s, rotor solidity, ρ, air density, and a rotor disk area are expressed.
7. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 4, wherein the consumption of communication energy of the unmanned aerial vehicle in the process of collecting data can be expressed as:
Figure FDA0004092508120000023
wherein p is l,n For the transmitting power of the first ground sensor node in the time slot n, L is the number of the ground sensor nodes, and the total transmitting power of the ground sensor equipment is P, namely 0<p l,n ≤P。
8. The multi-unmanned aerial vehicle data collection method based on federal reinforcement learning according to claim 1, wherein the track optimization of unmanned aerial vehicles and the reasonable optimization of the communication resources of the whole system are performed to minimize the energy consumption E of the whole system, and the specific flow is as follows:
step 1, each unmanned aerial vehicle perceives nearby sensor nodes, a greedy algorithm is used for selecting the sensor node with the largest speed as a collection target, and the unmanned aerial vehicle can collect the data of the next sensor node only after the data collection of the sensor node is completed;
step 2. The established problem is converted into a Markov decision problem, which is then solved by using a federal reinforcement learning method, wherein a complete Markov decision process can be composed of four parts, namely<S,A,γ,r n >Wherein S is a state space, A is an action space, gamma is a state transition probability when the unmanned aerial vehicle executes a task, and r n A reward function when the unmanned aerial vehicle executes the patrol task;
step 3, judging whether the unmanned aerial vehicle completes all data collection tasks, if not, executing the step 1, and if so, ending all data collection tasks;
and 4, judging whether the maximum iteration times are reached, if not, repeating the steps 1-3 until the algorithm reaches the maximum iteration times, and if so, outputting an optimal track and a resource allocation result and ending the program.
9. The multi-unmanned aerial vehicle data collection method based on federal reinforcement learning according to claim 8, wherein the process of optimizing the track of the unmanned aerial vehicle and the communication resources of the system using the federal reinforcement learning method is as follows:
a. assume that the state of unmanned plane m in time slot n is S n ={q m,nl,n Action a taken from action space a }, it n ={o m,nl,n Transition to the next state S n+1 ={q m,n+1l,n+1 And obtains a prize, and then transfers the result of the state (S n ,A n ,r n ,S n+1 ) Saving in an experience pool; wherein q is m,n Representing the position coordinates of the drone m at slot n,
Figure FDA0004092508120000031
represents a power discrete matrix, o m,n = { n, S, e, w } represents the flying direction of the unmanned aerial vehicle, n, S, e, w represents north, south, east, west, S, respectively n+1 The state of the unmanned plane m in the time slot n+1;
b. randomly selecting N from a pool of experiences 1 Step samples and using gradient descent method to reduce the loss function of the neural network to optimize the trajectory and communication resource allocation of the drone to obtain greater rewards, wherein the loss function is defined as
Figure FDA0004092508120000032
Wherein r is n+1 Representing the prize earned by the unmanned aerial vehicle in the n+1 time slot, lambda representing the discount factor, theta * And θ represents a factor affecting the neural network model parameters, Q (S n ,A n I theta) indicates that the unmanned aerial vehicle is in the current state S in the current network n Take action A n Is used for the control of the temperature of the liquid crystal display,
Figure FDA0004092508120000033
indicating that the drone is in the current state S in the target network n+1 Take action->
Figure FDA0004092508120000041
Q value of (2);
c. the unmanned aerial vehicle sends the trained model parameters to the aggregation end, and then the aggregation end aggregates the model parameters and sends the model parameters to each unmanned aerial vehicle, wherein the aggregation end can be served by a certain unmanned aerial vehicle for executing tasks, the energy consumed in the process of exchanging the model parameters can be ignored, and the model parameters of the unmanned aerial vehicle m in the time slot n are assumed to be w m,n The unmanned aerial vehicle aggregation end obtains the training of the next time slot of all unmanned aerial vehicles through aggregation weighting processingThe neural network model parameter is w n+1 And w is communicated in the time slot n+1 through downlink n+1 Is transmitted to each unmanned aerial vehicle,
wherein w is n+1 The concrete steps are as follows:
Figure FDA0004092508120000042
wherein θ is the number of model parameters of all unmanned aerial vehicles, and v is the number of model parameters of unmanned aerial vehicle m;
d. and c, judging whether the unmanned aerial vehicle completes the data collection task of the sensing equipment, if not, executing the step a by the unmanned aerial vehicle, and if so, completing the data collection of the sensor node.
10. A multi-unmanned aerial vehicle data collection method based on federal reinforcement learning is characterized by comprising a sensor data collection module, an unmanned aerial vehicle track optimization and resource allocation module and a result output module, wherein,
the sensor data collection module senses and collects data around the ground sensor equipment;
the unmanned aerial vehicle data collection module dispatches an unmanned aerial vehicle to collect data of the ground sensor equipment;
the unmanned aerial vehicle track optimization and resource allocation module performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system;
and the result output module outputs the optimal track and the resource allocation result of the optimized unmanned aerial vehicle inspection.
CN202310156117.8A 2023-02-21 2023-02-21 Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning Pending CN116205390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310156117.8A CN116205390A (en) 2023-02-21 2023-02-21 Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310156117.8A CN116205390A (en) 2023-02-21 2023-02-21 Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning

Publications (1)

Publication Number Publication Date
CN116205390A true CN116205390A (en) 2023-06-02

Family

ID=86515671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310156117.8A Pending CN116205390A (en) 2023-02-21 2023-02-21 Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning

Country Status (1)

Country Link
CN (1) CN116205390A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704823A (en) * 2023-06-12 2023-09-05 大连理工大学 Unmanned aerial vehicle intelligent track planning and general sense resource allocation method based on reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704823A (en) * 2023-06-12 2023-09-05 大连理工大学 Unmanned aerial vehicle intelligent track planning and general sense resource allocation method based on reinforcement learning
CN116704823B (en) * 2023-06-12 2023-12-19 大连理工大学 Unmanned aerial vehicle intelligent track planning and general sense resource allocation method based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN111786713B (en) Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN109099918B (en) Unmanned aerial vehicle-assisted wireless energy transmission system and node scheduling and path planning method
CN110364031B (en) Path planning and wireless communication method for unmanned aerial vehicle cluster in ground sensor network
CN109831797B (en) Unmanned aerial vehicle base station bandwidth and track joint optimization method with limited push power
CN108768497A (en) Unmanned plane assists wireless sense network and its node scheduling and flight Parameter design method
CN111432433B (en) Unmanned aerial vehicle relay intelligent flow unloading method based on reinforcement learning
CN109839955B (en) Trajectory optimization method for wireless communication between unmanned aerial vehicle and multiple ground terminals
CN113543066B (en) Integrated interaction and multi-target emergency networking method and system for sensing communication guide finger
CN108834049A (en) Wireless energy supply communication network and the method, apparatus for determining its working condition
CN116205390A (en) Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning
CN113784314B (en) Unmanned aerial vehicle data and energy transmission method assisted by intelligent reflection surface
CN113625761A (en) Communication task driven multi-unmanned aerial vehicle path planning method
CN108668257A (en) A kind of distribution unmanned plane postman relaying track optimizing method
CN114690799A (en) Air-space-ground integrated unmanned aerial vehicle Internet of things data acquisition method based on information age
WANG et al. Trajectory optimization and power allocation scheme based on DRL in energy efficient UAV‐aided communication networks
Babu et al. Fairness-based energy-efficient 3-D path planning of a portable access point: A deep reinforcement learning approach
Cui et al. Joint trajectory and power optimization for energy efficient UAV communication using deep reinforcement learning
CN114142908B (en) Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task
CN114372612B (en) Path planning and task unloading method for unmanned aerial vehicle mobile edge computing scene
CN114548663A (en) Scheduling method for charging unmanned aerial vehicle to charge task unmanned aerial vehicle in air
CN117062182A (en) DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method
CN116009590B (en) Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN116882270A (en) Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning
CN115334540A (en) Multi-unmanned aerial vehicle communication system based on heterogeneous unmanned aerial vehicles and energy consumption optimization method
Yu et al. Dynamic coverage path planning of energy optimization in UAV-enabled edge computing networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination