CN116205390A

CN116205390A - Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning

Info

Publication number: CN116205390A
Application number: CN202310156117.8A
Authority: CN
Inventors: 胡钰林; 高云飞; 黄雨茜
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-02-21
Filing date: 2023-02-21
Publication date: 2023-06-02

Abstract

The invention provides a multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning, comprising the following steps: step S1: the ground sensor equipment collects nearby data information, wherein the data information is collected by sensing information around the ground sensor equipment; step S2: the ground center dispatches unmanned aerial vehicles according to the number of dispatchable unmanned aerial vehicles to collect data of ground sensor equipment; step S3: the unmanned aerial vehicle performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system while collecting the data of all the ground sensing devices; step S4: judging whether the maximum training times are reached; step S5: outputting the optimal track and resource allocation condition of each unmanned aerial vehicle. The invention effectively saves the energy consumption required by the task completion of the whole system by optimizing the track of the unmanned aerial vehicle and the communication resource of the whole system, and has the characteristics of low cost, high speed, low delay and energy conservation.

Description

Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning

Technical Field

The application relates to the field of wireless communication, in particular to a multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning.

Background

In recent years, with the rapid development of internet of things technology, a large number of sensor nodes are deployed at some key nodes to monitor the environment. Maintaining a very large scale network is very challenging on the one hand, because the sensor nodes are very energy consuming to operate, and frequent replacement or charging of the sensor nodes can result in high costs. A presently preferred solution is to use a backscatter communication technique that allows the sensor node to transmit data by reflecting radio frequency signals. Although the transmission power consumption is low, the transmission cannot be carried out in a long distance, which makes the data collection of the system very difficult. On the other hand, in the process of collecting or processing data, the security performance of the data is difficult to ensure, and the data can be stolen by lawless persons, so that the problems of data leakage and the like are caused. Therefore, how to ensure the long-distance and rapid transmission of data and ensure the security performance of the data is a problem that needs to be solved urgently.

Disclosure of Invention

The invention provides a multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning. Firstly, the ground sensing equipment collects data which needs to be transmitted, and then the ground center dispatches the unmanned aerial vehicle to collect the data collected by the sensors in the sensor area. When the unmanned aerial vehicle collects data sent by the ground sensor, the track of the unmanned aerial vehicle needs to be optimized, and communication resources are reasonably distributed, so that the energy consumption of the whole system is saved.

In order to solve the technical problems, the invention adopts the following technical scheme:

a multi-unmanned aerial vehicle data collection method based on federal reinforcement learning comprises the following steps:

step S101: the ground sensor equipment collects nearby data information, and the data information is collected by sensing surrounding information through the ground sensor equipment;

step S102: the ground center dispatches unmanned aerial vehicles according to the number of dispatchable unmanned aerial vehicles to collect data of ground sensor equipment;

step S103: the unmanned aerial vehicle performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system while collecting the data of all the ground sensing devices;

step S104: judging whether the maximum training times are reached;

step S105: outputting the optimal track and resource allocation condition of each unmanned aerial vehicle.

Further, in step S2, each unmanned plane senses a nearby sensor node, and selects a sensor node with the largest rate as a collection target by using a greedy algorithm, so as to ensure the integrity of data, wherein the unmanned plane can collect the data of the next sensor node only after completing the data collection of the sensor node.

Further, in the step S3, the energy consumption of the whole system is optimized using a Federated Learning dueling DDQN algorithm based on the federal reinforcement learning method.

Further, in the step S3, the total energy consumed by the unmanned aerial vehicle to complete the total data collection task may be expressed as:

E＝E _* +E _com

wherein E is _* The energy consumption for the whole system comprises the total onboard energy consumed by all unmanned aerial vehicles in the total time slot N of time, E _com The unmanned aerial vehicle is used for communicating energy in the data collection process.

Further, the total onboard energy E consumed by all unmanned aerial vehicles at the time total time slot N _* Can be expressed as:

wherein P is _m,n And (V) ensuring the propelling energy consumption of the flight of the unmanned aerial vehicle M in a time slot N, wherein M is the number of unmanned aerial vehicles, and the total data collection time slot of the unmanned aerial vehicle is N.

Further, in time slot n, the thrust energy consumption to ensure the flight of drone m may be expressed as:

wherein P is ₀ And P _i Is two constants, respectively represents the blade profile power and the induced power under the hovering state of the unmanned aerial vehicle, V is the flying speed of the unmanned aerial vehicle, U _tip For tip speed of rotor blade, v ₀ Is the average rotor induced speed in hovering state, d ₀ The fuselage resistance ratio, s, rotor solidity, ρ, air density, and a rotor disk area are expressed.

Further, the consumption of communication energy by the drone in performing the data collection process may be expressed as:

wherein p is _l,n For the transmitting power of the first ground sensor node in the time slot n, L is the number of the ground sensor nodes, and the total transmitting power of the ground sensor equipment is P, namely 0<p _l,n ≤P。

Further, the track optimization of the unmanned aerial vehicle and the communication resource of the whole system are reasonably optimized to minimize the energy consumption E of the whole system, and the specific flow is as follows:

step 1, each unmanned aerial vehicle perceives nearby sensor nodes, a greedy algorithm is used for selecting the sensor node with the largest speed as a collection target, and the unmanned aerial vehicle can collect the data of the next sensor node only after the data collection of the sensor node is completed;

step 2. The established problem is converted into a Markov decision problem, which is then solved by using a federal reinforcement learning method, wherein a complete Markov decision process can be composed of four parts, namely<S,A,γ,r _n >Wherein S is a state space, A is an action space, and gamma is a state transition when the unmanned aerial vehicle executes a taskShift probability, r _n A reward function when the unmanned aerial vehicle executes the patrol task;

step 3, judging whether the unmanned aerial vehicle completes all data collection tasks, if not, executing the step 1, and if so, ending all data collection tasks;

and 4, judging whether the maximum iteration times are reached, if not, repeating the steps 1-3 until the algorithm reaches the maximum iteration times, and if so, outputting an optimal track and a resource allocation result and ending the program.

Further, the process of optimizing the trajectory of the unmanned aerial vehicle and the communication resources of the system by using the federal reinforcement learning method is as follows:

a. assume that the state of unmanned plane m in time slot n is S _n ＝{q _m,n ,α _l,n Action a taken from action space a }, it _n ＝{o _m,n ,α _l,n Transition to the next state S _n+1 ＝{q _m,n+1 ,α _l,n+1 And obtains a prize, and then transfers the result of the state (S _n ,A _n ,r _n ,S _n+1 ) Saving in an experience pool; wherein q is _m,n Representing the position coordinates of the drone m at slot n,

represents a power discrete matrix, o _m,n = { n, S, e, w } represents the flying direction of the unmanned aerial vehicle, n, S, e, w represents north, south, east, west, S, respectively _n+1 The state of the unmanned plane m in the time slot n+1;

b. randomly selecting N from a pool of experiences ₁ Step samples and using gradient descent method to reduce the loss function of the neural network to optimize the trajectory and communication resource allocation of the drone to obtain greater rewards, wherein the loss function is defined as

Wherein r is _n+1 Representing the prize earned by the unmanned aerial vehicle in the n+1 time slot, lambda representing the discount factor, theta ^* Affecting nerves with thetaFactors, Q (S) _n ,A _n I theta) indicates that the unmanned aerial vehicle is in the current state S in the current network _n Take action A _n Is used for the control of the temperature of the liquid crystal display,

indicating that the drone is in the current state S in the target network _n+1 Take action->

Q value of (2);

c. the unmanned aerial vehicle sends the trained model parameters to the aggregation end, and then the aggregation end aggregates the model parameters and sends the model parameters to each unmanned aerial vehicle, wherein the aggregation end can be served by a certain unmanned aerial vehicle for executing tasks, the energy consumed in the process of exchanging the model parameters can be ignored, and the model parameters of the unmanned aerial vehicle m in the time slot n are assumed to be w _m,n The unmanned aerial vehicle aggregation end obtains a neural network model parameter w of the training of the next time slot of all unmanned aerial vehicles through aggregation weighting processing ⁿ⁺¹ And w is communicated in the time slot n+1 through downlink ⁿ⁺¹ Is transmitted to each unmanned aerial vehicle,

wherein w is ⁿ⁺¹ The concrete steps are as follows:

wherein the method comprises the steps of

For the number and the size of model parameters of all unmanned aerial vehicles, v represents the number and the size of model parameters of unmanned aerial vehicle m;

d. and c, judging whether the unmanned aerial vehicle completes the data collection task of the sensing equipment, if not, executing the step a by the unmanned aerial vehicle, and if so, completing the data collection of the sensor node.

The invention also provides a multi-unmanned aerial vehicle data collection method based on federal reinforcement learning, which comprises a sensor data collection module, an unmanned aerial vehicle track optimization and resource allocation module and a result output module, wherein,

the sensor data collection module senses and collects data around the ground sensor equipment;

the unmanned aerial vehicle data collection module dispatches an unmanned aerial vehicle to collect data of the ground sensor equipment;

the unmanned aerial vehicle track optimization and resource allocation module performs unmanned aerial vehicle track optimization and resource allocation with the aim of minimizing the energy consumption of the whole system;

and the result output module outputs the optimal track and the resource allocation result of the optimized unmanned aerial vehicle inspection.

Compared with the prior art, the invention at least comprises the following beneficial effects:

1. by utilizing the flexibility of the unmanned aerial vehicle and the characteristic of high speed of air-ground communication transmission, the energy consumption required by the task completion of the whole system is effectively saved by optimizing the track of the unmanned aerial vehicle and the communication resource of the whole system, and the unmanned aerial vehicle has the characteristics of low cost, high speed, low delay and energy conservation.

2. The method for learning by using the distributed federation enables the multi-unmanned aerial vehicle to share model parameters during training when performing data collection tasks, solves the problem of data leakage when the unmanned aerial vehicle collects data, and accelerates the convergence performance of an algorithm compared with a centralized multi-unmanned aerial vehicle data collection solution.

3. When the unmanned aerial vehicles cooperate with data collection, the problems of safe flight and flying-out boundary between the unmanned aerial vehicles are considered, so that the unmanned aerial vehicles are more close to a real data collection scene when the data collection task is executed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system according to the present invention;

FIG. 2 is a general flow chart of the present invention;

FIG. 3 is a flow chart of unmanned aerial vehicle trajectory optimization and resource allocation based on the federal reinforcement learning algorithm;

FIG. 4 is a flow of greedy strategy selection data collection points of the present invention;

fig. 5 is a training flow of unmanned plane trajectory optimization and resource allocation according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the experimental methods described in the following embodiments, unless otherwise specified, are all conventional methods, and the reagents and materials, unless otherwise specified, are all commercially available; in the description of the present invention, the terms "transverse", "longitudinal", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus are not to be construed as limiting the present invention.

Furthermore, the terms "horizontal," "vertical," "overhang," and the like do not denote a requirement that the component be absolutely horizontal or overhang, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.

Aiming at the problems that the sensor nodes are difficult to charge, long-distance communication transmission is difficult to carry out, and data leakage is easy to occur in the transmission process of data, the invention provides an unmanned aerial vehicle data collection method and system based on federal reinforcement learning, which not only solve the problems that ground sensor equipment cannot be transmitted in a long distance and the electric quantity of the sensor equipment is limited; the problem of data leakage in the data transmission process is also solved. The method has the advantages of energy saving, safety and high-speed transmission.

As shown in fig. 2, the embodiment provides a multi-unmanned aerial vehicle data collection method based on federal reinforcement learning, which includes the following steps:

step S104: judging whether the maximum training times are reached;

The system is assumed to comprise M unmanned aerial vehicles and L ground sensor nodes, wherein the L ground sensor nodes are distributed in a K multiplied by KKm area, and the total data collection time slot of the unmanned aerial vehicles is assumed to be N; the position coordinate of the mth unmanned aerial vehicle in the time slot n isq _m,n ＝[x _m,n ,y _m,n ,z _m,n ]Wherein x is _m,n For the abscissa of unmanned plane m in time slot n, y _m,n For unmanned plane m on the ordinate of time slot n, z _m,n The horizontal height of the unmanned plane m from the ground in the time slot n is set; the position coordinate of the first ground sensor node is w _l ＝[x _l ,y _l ,0]The total transmission power of the ground sensing equipment is P, namely 0<p _l,n P is less than or equal to P _l,n The transmit power at time slot n for the first ground sensor node.

In step S102, each unmanned aerial vehicle senses a nearby sensor node, and selects a sensor node with the highest speed as a collection target by using a greedy algorithm, so as to ensure the integrity of data, wherein the unmanned aerial vehicle can collect the data of the next sensor node only after completing the data collection of the sensor node.

In step S103, the energy consumption of the entire system is optimized using Federated Learning dueling DDQN algorithm based on the federal reinforcement learning method.

In step S103, the energy consumption of the whole system is composed of two parts, one part is the propulsion energy consumption for ensuring the unmanned aerial vehicle to fly, and the other part is the communication energy consumed by the unmanned aerial vehicle in the process of collecting data, wherein in the time slot n, the propulsion energy consumption for ensuring the unmanned aerial vehicle to fly can be expressed as

Wherein P is ₀ And P _i Is two constants, respectively represents the blade profile power and the induced power under the hovering state of the unmanned aerial vehicle, V is the flying speed of the unmanned aerial vehicle, U _tip For tip speed of rotor blade, v ₀ Is the average rotor induced speed in hover. In addition, d ₀ The fuselage resistance ratio, s, rotor solidity, ρ, air density, and a rotor disk area are expressed. Then the total airborne energy E consumed by all unmanned aerial vehicles in the total time slot N of time _* Can be expressed as:

the consumption of communication energy by the drone in performing the data collection process may be expressed as

Thus, the total energy consumed by the drone to complete the overall data collection task can be expressed as:

E＝E _* +E _com

the track optimization of the unmanned aerial vehicle and the communication resource of the whole system are reasonably optimized to minimize the energy consumption E of the whole system, and the specific flow is as follows:

step 2. The established problem is converted into a Markov decision problem, which is then solved by using a federal reinforcement learning method, wherein a complete Markov decision process can be composed of four parts, namely<S,A,γ,r _n >Wherein S is a state space, A is an action space, gamma is a state transition probability when the unmanned aerial vehicle executes a task, and r _n A reward function when the unmanned aerial vehicle executes the patrol task;

in this design the state space is s= { q _m,n ,α _l,n }, where q _m,n Representing the position coordinates of the drone m at slot n,

representing a power discrete matrix. The motion space is a= { o _m,n ,α _l,n O, where o _m,n = { n, s, e, w } represents the flight direction of the unmanned aerial vehicle, n, s, e, w represents north, south, east, west, respectively. State transitionThe probability gamma indicates that the time slot n is in the state S _n The unmanned aerial vehicle executes the selected action a according to the action selection strategy _n Transition to the next state S _n+1 Is a probability of (2). The bonus function may be expressed as

Wherein a is a negative number which represents the penalty of the unmanned aerial vehicle for out-of-bounds or collision between unmanned aerial vehicles when executing tasks, r _m,n For the data collection rate of unmanned plane m in time slot n, beta is a weight coefficient, which is a constant, wherein the data collection rate r of unmanned plane m in time slot n _m,n Can be expressed as:

where B is the bandwidth of the system, h _m,n Representing the channel gain of the system, i is the channel gain of the unmanned plane m between the time slot n and the ground sensing node l in the transmission process, and p _l Transmit power, N, for user/uplink communication transmission ₀ Is the noise power spectral density.

After the establishment of the Markov decision is completed, the track of the unmanned aerial vehicle and the communication resources of the system are reasonably optimized by using a federal reinforcement learning method, so that the energy consumption of the whole system is minimized.

step 4, judging whether the maximum iteration times are reached;

if not, repeating the step 1-3 until the algorithm reaches the maximum iteration times, if so, outputting the optimal track and the resource allocation result and ending the program.

The process of optimizing the track of the unmanned aerial vehicle and the communication resources of the system by using the federal reinforcement learning method is as follows:

a. suppose unmanned aerial vehiclem is S in the state of time slot n _n ＝{q _m,n ,α _l,n Action a taken from action space a }, it _n ＝{o _m,n ,α _l,n Transition to the next state S _n+1 ＝{q _m,n+1 ,α _l,n+1 And obtains a prize, and then transfers the result of the state (S _n ,A _n ,r _n ,S _n+1 ) Saving in experience pool

Wherein r is _n+1 Representing the prize earned by the unmanned aerial vehicle in the n+1 time slot, lambda representing the discount factor, theta ^* And θ represents a factor affecting the neural network model parameters, Q (S _n ,A _n I theta) indicates that the unmanned aerial vehicle is in the current state S in the current network _n Take action A _n Is used for the control of the temperature of the liquid crystal display,

Q value of (2);

wherein w is ⁿ⁺¹ The concrete steps are as follows:

wherein the method comprises the steps of

As shown in fig. 3, the steps for the unmanned aerial vehicle to begin collecting data volumes of the ground sensor nodes are:

step 201: the flow starts.

Step 202: each drone uses a greedy algorithm to determine its data collection sensor nodes.

Step 203: the drone obtains distances from other drones and boundaries at the current location.

Step 204: the unmanned aerial vehicle optimizes own track and resource allocation.

Step 205: and judging whether the unmanned aerial vehicle completes the data collection task of the sensor node.

Step 206: and judging whether the unmanned aerial vehicle completes the data collection task of all the sensor nodes.

Step 207: the flow ends.

In step 202, the unmanned aerial vehicle determines that the data collecting sensor node is as shown in fig. 4 by using a greedy algorithm, and the specific steps are as follows:

step 301: the flow starts.

Step 302: the sensor nodes are numbered.

Step 303: and calculating the data rates of the unmanned aerial vehicle and each sensor node, deleting the sequence numbers of the sensor nodes which are already finished, and sequencing the sensor nodes according to the rates from large to small.

Step 304: and judging whether the data collection of the sensor node at the maximum rate is finished, if yes, executing step 303, otherwise, executing step 305.

Step 305: the device number is output.

Step 306: the flow ends.

As shown in fig. 5, the step 204 of optimizing the self track and the resource allocation of the unmanned aerial vehicle is specifically as follows:

step 401: the unmanned aerial vehicle selects the current action according to the strategy selection mechanism, and perceives the distance of each unmanned aerial vehicle in the next time slot and the distance from the boundary.

Step 402: the unmanned aerial vehicle obtains rewards according to the state in the flight process.

Step 403: the state transition information is stored in an experience pool.

Step 404: the loss function is trained.

Step 405: and each unmanned aerial vehicle sends the model parameters to the aggregation end.

Step 406: the aggregation processing of the aggregation end transmits the processed model parameters to each unmanned aerial vehicle.

The embodiment also provides a multi-unmanned aerial vehicle data collection method based on federal reinforcement learning, which is characterized by comprising a sensor data collection module, an unmanned aerial vehicle track optimization and resource allocation module and a result output module, wherein,

In summary, the present invention has the following technical effects;

1. according to the problems of limited transmission distance of the surface sensing equipment and limited energy of the total system, long-distance data transmission and high-speed data transmission are realized by utilizing the maneuvering performance and high-probability LoS channel model of the unmanned aerial vehicle.

2. According to the unsafe problems such as data leakage and the like when the unmanned aerial vehicle completes data collection by the cooperative ground sensing equipment, the problem of protecting data privacy when the unmanned aerial vehicle cooperates with data collection is realized by utilizing a federal learning method, only model parameters are required to be uploaded, a large amount of data transmission is not required, and communication overhead is reduced.

3. The problem that multiple unmanned planes easily go out of bounds and collide when performing data collection is considered, and the method and the device are more suitable for a real data collection scene.

4. According to the total resources of the system and the limited energy problem, the track and communication resource allocation of the unmanned aerial vehicle are jointly optimized, so that the total energy consumption of the system is minimized on the premise of meeting all data collection requirements.

The above embodiments are merely illustrative of the technical solutions of the present invention. The method and apparatus according to the present invention are not limited to the description of the embodiments above, but rather the scope of the invention is defined by the claims. Any modifications, additions or equivalent substitutions made by those skilled in the art based on this embodiment are within the scope of the invention as claimed in the claims.

Claims

1. A multi-unmanned aerial vehicle data collection method based on federal reinforcement learning is characterized by comprising the following steps:

step S101: the ground sensor equipment collects nearby data information, wherein the data information is collected by sensing information around the ground sensor equipment;

step S104: judging whether the maximum training times are reached;

2. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 1, wherein in step S102, each unmanned aerial vehicle senses nearby sensor nodes, and selects a sensor node with the greatest rate as a collection target by using a greedy algorithm, so as to ensure the integrity of data, wherein the unmanned aerial vehicle can collect the data of the next sensor node only after completing the data collection of the sensor node.

3. The multi-unmanned aerial vehicle data collection method according to claim 1, wherein in step S103, the energy consumption of the whole system is optimized using Federated Learning dueling DDQN algorithm based on federal reinforcement learning method.

4. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 1, wherein in step S103, the total energy consumed by the unmanned aerial vehicle to complete the total data collection task can be expressed as:

E＝E _* +E _com

5. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 4, wherein the total onboard energy E consumed by all unmanned aerial vehicles in the total time slot N is the total time _* Can be expressed as:

6. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 5, wherein in time slot n, the thrust energy consumption for ensuring the flight of unmanned aerial vehicle m can be expressed as:

7. The method for collecting data of multiple unmanned aerial vehicles based on federal reinforcement learning according to claim 4, wherein the consumption of communication energy of the unmanned aerial vehicle in the process of collecting data can be expressed as:

8. The multi-unmanned aerial vehicle data collection method based on federal reinforcement learning according to claim 1, wherein the track optimization of unmanned aerial vehicles and the reasonable optimization of the communication resources of the whole system are performed to minimize the energy consumption E of the whole system, and the specific flow is as follows:

9. The multi-unmanned aerial vehicle data collection method based on federal reinforcement learning according to claim 8, wherein the process of optimizing the track of the unmanned aerial vehicle and the communication resources of the system using the federal reinforcement learning method is as follows:

Q value of (2);

c. the unmanned aerial vehicle sends the trained model parameters to the aggregation end, and then the aggregation end aggregates the model parameters and sends the model parameters to each unmanned aerial vehicle, wherein the aggregation end can be served by a certain unmanned aerial vehicle for executing tasks, the energy consumed in the process of exchanging the model parameters can be ignored, and the model parameters of the unmanned aerial vehicle m in the time slot n are assumed to be w _m,n The unmanned aerial vehicle aggregation end obtains the training of the next time slot of all unmanned aerial vehicles through aggregation weighting processingThe neural network model parameter is w ⁿ⁺¹ And w is communicated in the time slot n+1 through downlink ⁿ⁺¹ Is transmitted to each unmanned aerial vehicle,

wherein w is ⁿ⁺¹ The concrete steps are as follows:

wherein θ is the number of model parameters of all unmanned aerial vehicles, and v is the number of model parameters of unmanned aerial vehicle m;

10. A multi-unmanned aerial vehicle data collection method based on federal reinforcement learning is characterized by comprising a sensor data collection module, an unmanned aerial vehicle track optimization and resource allocation module and a result output module, wherein,