CN116208968A

CN116208968A - Track planning method and device based on federal learning

Info

Publication number: CN116208968A
Application number: CN202211726582.2A
Authority: CN
Inventors: 陈硕; 王鉴威; 李学华
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-06-02
Anticipated expiration: 2042-12-30
Also published as: CN116208968B

Abstract

The application discloses a track planning method and device based on federal learning. The method comprises the following steps: calculating prior information of an unmanned aerial vehicle auxiliary ground communication network; based on prior information, establishing a calculation and unloading optimization model of the coverage unmanned aerial vehicle, and solving the calculation and unloading optimization model of the coverage unmanned aerial vehicle through a training process by utilizing deep reinforcement learning to obtain a local network model obtained through the deep reinforcement learning training; based on the local network model, the globally optimal unmanned aerial vehicle track planning scheme is obtained by utilizing federal learning iterative updating. The method and the device solve the technical problems that the calculation unloading time delay and the energy consumption requirement of a ground user on the calculation unloading time delay are low, and the energy consumption and the time consumption are overlarge in the multi-unmanned-plane training process because the calculation unloading is difficult to be carried out by utilizing a single unmanned plane to provide the calculation unloading for the ground in a large scale.

Description

Track planning method and device based on federal learning

Technical Field

The application relates to the field of communication, in particular to a track planning method and device based on federal learning.

Background

In recent years, with the continuous development of the internet of things and related intelligent devices, more and more emerging mobile technologies gradually enter into people's practical applications, such as face recognition, virtual reality, smart city, license plate detection, smart agriculture, and the like. Meanwhile, with the intelligent development of the application of the existing equipment, the number of smart phones, smart automobiles, smart furniture, smart sensors and the like is increased due to explosion. In this case, the computationally intensive users have a high demand for time delay and energy consumption, and therefore high demands are placed on the way tasks are calculated. However, considering that the existing hardware devices have limited computing power due to the technical requirements of light weight, portability, economy and the like, the performance requirements of users on low-delay energy consumption are greatly limited, and the requirements of users cannot be well met.

As a new technology, mobile edge computing (Mobile Edge Computing, MEC) can perform data analysis and processing closer to the surface user equipment than more distant cloud servers by migrating servers from the cloud to the network edge.

Conventional mobile communications typically rely on a terrestrial communications infrastructure, which has a good coverage effect for the intended terrestrial users. However, for urban building shielding, remote forest mountain areas, lakes and oceans and emergency scenes (such as earthquakes, fires and the like) in a real scene, the ground traditional base station cannot effectively cover users in special environments, and great influence is caused on communication transmission.

Unmanned Aerial Vehicles (UAVs) are increasingly being used in a number of previously used industries due to their small size, high flexibility, convenient deployment, low price, and simple operation. The unmanned aerial vehicle can reach the position which cannot be reached by common ground equipment due to the unique characteristics, and can carry a base station and a server, establish wireless communication in the air and calculate unloading service. In addition, in order to establish a reliable communication link, the unmanned aerial vehicle can adjust the deployment position in time according to the communication environment due to the position dynamic characteristic of the unmanned aerial vehicle, and the unmanned aerial vehicle can flexibly move through a three-dimensional space to establish connection with ground users, so that the communication performance is improved. Due to the above characteristics, the multi-rotor unmanned aerial vehicle and the fixed-wing unmanned aerial vehicle can be regarded as relay nodes to provide seamless connection.

In view of the progress of machine learning, deep neural networks have been a subject of intense research in combination with reinforcement learning, i.e., deep reinforcement learning. In deep reinforcement learning, an agent interacts with the environment by learning an optimal strategy during the exploration process. Deep reinforcement learning can utilize deep neural networks to estimate correlation functions, facilitating more accurate convergence and approximation, as compared to traditional reinforcement learning.

Considering the characteristics of the unmanned aerial vehicle control architecture, the implementation of the intelligent unmanned aerial vehicle network can be divided into two types: a centralized system architecture control method and a distributed system architecture control method. Centralized approaches typically require a central controller to obtain complete information about the network and make collective decisions. A centralized architecture can theoretically achieve optimal system performance. However, as the scale of networks continues to expand, the complexity of modeling a global dynamic network environment in real time gradually increases. In addition, the data of the ground users need to be centrally uploaded to the central controller and processed as a whole, which may cause a certain degree of communication delay and affect communication efficiency. In contrast, the distributed control method can well reduce the communication quantity between systems and reduce the computational complexity by processing the related data content through decentralized computing.

For unmanned aerial vehicle cluster control, the distributed architecture is adopted, so that the traffic can be reduced, and the computation complexity of the agent can be effectively reduced. However, in classical distributed control methods, most of the decisions of such methods are made locally at the agent. Without information sharing between agents, it is difficult to achieve global result optimization. Furthermore, most existing studies do not take into account user privacy issues during transmission, which may pose a threat to the user's data security.

The proposal of federal learning concepts has attracted great attention. Federal learning is a distributed machine learning framework approach that can overcome the shortcomings of traditional distributed learning approaches. It saves the training data on the local device and learns the global model by aggregating the locally updated parameters (i.e., the gradient and weight of the model) to the cloud server. Have been widely used to achieve privacy protection and information sharing.

In the prior art, a ground user calculation unloading method based on a single unmanned aerial vehicle double-delay depth deterministic strategy gradient method (TD 3) is provided, and the user association with ground equipment is selected while the unmanned aerial vehicle flight route is subjected to track planning. The prior art also provides a ground user calculation unloading method based on the multi-agent depth deterministic strategy gradient method of the multi-unmanned aerial vehicle, which adopts a mode that the multi-unmanned aerial vehicle unloads the ground user, selects the association with each unmanned aerial vehicle and the ground user equipment while planning the flight route cooperatively for the multi-unmanned aerial vehicle, improves the load fairness of the ground user of each unmanned aerial vehicle, maximizes the geographic fairness among the covered ground users and reduces the energy consumption of the ground user. In addition, the prior art also provides an unmanned aerial vehicle deployment and resource allocation method based on combination of federal learning and multi-agent reinforcement learning, a multi-agent cooperative environment learning mode is adopted, a deep Q network method is combined with a federal learning framework, and a global fusion model is obtained through an unmanned aerial vehicle network, so that optimal decision of multi-unmanned aerial vehicle cooperative communication is realized.

However, in the prior art, the single unmanned aerial vehicle calculation unloading is adopted to collect information and unload tasks for the ground user equipment. However, in the actual application scene, the problems of large task geographical area, large number of ground users, short working time of single flight unmanned aerial vehicle and the like often exist, the large-scale use of the ground is difficult to effectively provide calculation and unloading by utilizing the single unmanned aerial vehicle, and the problems of low calculation and unloading time delay, low energy consumption requirement and the like of the ground users are difficult to meet.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a track planning method and device based on federal learning, which at least solve the technical problems that the calculation unloading is difficult to be provided for the ground in a large scale in time due to the fact that the calculation unloading is difficult to be carried out by using a single unmanned aerial vehicle, the requirements of a ground user on calculation unloading time delay and energy consumption are low, and the energy consumption and time consumption are overlarge in the training process of multiple unmanned aerial vehicles.

According to an aspect of the embodiments of the present application, there is provided a track planning method based on federal learning, including: initializing a communication environment, a ground user position and an initial position of the unmanned aerial vehicle in an unmanned aerial vehicle auxiliary ground communication network, and calculating prior information of the unmanned aerial vehicle auxiliary ground communication network based on an initial value obtained by initialization; based on the prior information, establishing a coverage unmanned aerial vehicle calculation and unloading optimization model, and solving the coverage unmanned aerial vehicle calculation and unloading optimization model through a training process by utilizing deep reinforcement learning to obtain a local network model obtained through deep reinforcement learning training; and based on the local network model, obtaining a globally optimal unmanned aerial vehicle track planning scheme by utilizing federal learning iterative updating.

According to another aspect of the embodiments of the present application, there is also provided a track planning apparatus based on federal learning, including: the initialization module is configured to initialize a communication environment, a ground user position and an initial position of the unmanned aerial vehicle in the unmanned aerial vehicle auxiliary ground communication network, and calculate prior information of the unmanned aerial vehicle auxiliary ground communication network based on the initial value obtained by initialization; the model training module is configured to establish a coverage unmanned aerial vehicle calculation unloading optimization model based on the prior information, and solve the coverage unmanned aerial vehicle calculation unloading optimization model through a training process by utilizing deep reinforcement learning to obtain a local network model obtained through deep reinforcement learning training; and the track planning module is configured to obtain a globally optimal unmanned aerial vehicle track planning scheme by utilizing federal learning iterative updating based on the local network model.

According to another aspect of the embodiment of the application, there is further provided an unmanned aerial vehicle assisted ground communication network, including an unmanned aerial vehicle, a ground user equipment and a base station, wherein the unmanned aerial vehicle includes the track planning device based on federal learning.

In the embodiment of the application, a coverage unmanned aerial vehicle calculation unloading optimization model is established based on the prior information, and the coverage unmanned aerial vehicle calculation unloading optimization model is solved through a training process by utilizing deep reinforcement learning, so that a local network model obtained through deep reinforcement learning training is obtained; based on the local network model, the globally optimal unmanned aerial vehicle track planning scheme is obtained by utilizing federal learning iterative updating, and the technical problems that the calculation unloading is difficult to be provided for the ground in a large scale in time due to the fact that calculation unloading is difficult to be carried out by utilizing a single unmanned aerial vehicle, the requirements of a ground user on calculation unloading time delay and energy consumption are low, and the energy consumption and time consumption are overlarge in the multi-unmanned aerial vehicle training process are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a federal learning-based trajectory planning method according to an embodiment of the present application;

FIG. 2 is a flow chart of another federal learning-based trajectory planning method according to an embodiment of the present application;

FIG. 3 is a training graph according to an embodiment of the present application;

FIG. 4 is a graph of data transmission delay during model training according to an embodiment of the present application;

FIG. 5 is a graph of data transmission energy consumption during model training according to an embodiment of the present application;

FIG. 6 is a schematic illustration of three unmanned aerial vehicle flight trajectories according to an embodiment of the present application;

fig. 7 is a schematic diagram of geographic fairness among user devices according to an embodiment of the application;

fig. 8 is a schematic diagram of load fairness among drones according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a federal learning-based trajectory planning device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a unmanned aerial vehicle assisted ground communication network according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to an embodiment of the present application, there is provided a flowchart of a track planning method based on federal learning, as shown in fig. 1, the method includes:

step S102, initializing a communication environment, a ground user position and an initial position of the unmanned aerial vehicle in the unmanned aerial vehicle auxiliary ground communication network, and calculating prior information of the unmanned aerial vehicle auxiliary ground communication network based on the initial value obtained by initialization.

Step S104, based on the prior information, a coverage unmanned aerial vehicle calculation unloading optimization model is established, and the coverage unmanned aerial vehicle calculation unloading optimization model is solved through a training process by utilizing deep reinforcement learning, so that a local network model obtained through deep reinforcement learning training is obtained.

A state set, an action set, and a reward function are first defined.

For example, determining decision requirements and constraints for optimizing devices in a drone-assisted terrestrial communication network; a set of states, a set of actions, and a reward function for the drone in the device are defined based on the determined decision requirement and constraints.

Wherein the constraint includes at least one of: in the process of training the unmanned aerial vehicle calculation unloading optimization model, training time iterated once through the federal learning; in the process of executing a task by the unmanned aerial vehicle, the unmanned aerial vehicle executes the total energy consumption of the task, wherein the task comprises at least one of the following: moving, hovering, and calculating and unloading; the method comprises the steps that a ground user equipment in an unmanned aerial vehicle auxiliary ground communication network calculates and unloads total consumed time for the unmanned aerial vehicle; and the ground user equipment performs the calculation time delay of local unloading.

Wherein the decision requirement comprises at least one of: unmanned aerial vehicle load fairness, serving ground user geographic fairness, and ground user energy consumption are minimized.

Next, a bonus function is determined. For example, determining a geographic fairness index based on the serving ground user geographic fairness; determining a load fairness index based on the unmanned aerial vehicle load fairness; determining a minimum value of energy consumption of the surface user equipment based on the surface user energy consumption minimization; determining the reward function based on the geographic fairness index, the load fairness index, the minimum value of the energy consumption, a penalty when the drone flies out of a prescribed area or a distance from the drone is less than a preset threshold, and a penalty when the drone is more than a preset distance threshold from an associated ground user device

Thereafter, the following steps are cyclically performed until a maximum prize value is found:

1) Based on a preset state set, an action set and a reward function, the unmanned aerial vehicle selects actions according to the current state, and calculates a reward value of the current state of the unmanned aerial vehicle according to feedback of the current network environment;

2) Acquiring a benefit estimation of the next state of the unmanned aerial vehicle, and updating the benefit of the current state according to the benefit estimation and the reward value of the current state;

3) And entering a next state from the current state, and taking the next state as the current state.

And solving the overlay unmanned aerial vehicle calculation unloading optimization model through the deep reinforcement learning through a training process to obtain a local network model obtained through the deep reinforcement learning training.

In the prior art, in the cooperative work process of multiple unmanned aerial vehicles, the training is performed by adopting a mode of original data exchange among machines, and great safety problem consideration is also provided for the safety privacy of user data. Meanwhile, if the communication problems such as delay and interruption of data exchange among machines are caused, the training effect of machine learning is greatly affected. In this embodiment, the model is trained on the unmanned aerial vehicle, so that the safety privacy of user data is protected, and the problem that the unsmooth communication affects the training effect of machine learning is also solved.

And step S106, based on the local network model, obtaining a globally optimal unmanned aerial vehicle track planning scheme by utilizing federal learning iterative updating.

Firstly, parameters of the local network model are sent to a cloud server, wherein the cloud server collects weights and gradients of the local network model, aggregates the local network models obtained by deep reinforcement learning training of each unmanned aerial vehicle, and generates a global network model; then, planning a track of the unmanned aerial vehicle based on the global network model; thereafter, the local network model on the drone is updated based on the weights and gradients assigned by the cloud server.

In the prior art, single unmanned aerial vehicle calculation unloading is generally adopted to collect information and unload tasks for ground user equipment. However, in the actual application scene, the problems of large task geographical area, large number of ground users, short working time of single flight unmanned aerial vehicle and the like often exist, the calculation and the unloading by utilizing the single unmanned aerial vehicle are difficult to effectively provide the calculation and the unloading for large-scale ground users in time, and the problems of low calculation and unloading time delay, low energy consumption requirement and the like of the ground users are difficult to meet.

In the embodiment of the invention, the track planning of the unmanned aerial vehicle and the user association selection are trained by combining the federal learning framework, a set of system state-action-rewarding function scheme is provided, and a set of efficient distributed collaboration mechanism is obtained in the interaction process of each unmanned aerial vehicle and the environment, so that the load fairness of the unmanned aerial vehicle, the geographic fairness of the ground user equipment served and the smaller energy consumption of the user equipment are met.

In addition, in the prior art, a part of a control method of the unmanned aerial vehicle controls the unmanned aerial vehicle in a mode of outputting discrete action space, so that continuity control cannot be performed. The embodiment can carry out continuity control on the unmanned aerial vehicle, so that the training performance of the unmanned aerial vehicle is improved.

According to the track planning method based on federal learning in the unmanned aerial vehicle auxiliary ground communication network, real-time effective calculation unloading service is carried out on ground users in the unmanned aerial vehicle auxiliary ground communication network, so that the requirement that the ground user equipment can solve a time delay sensitive task is met, and the user equipment can finish the task with lower energy consumption while energy consumption and time delay in the multi-unmanned aerial vehicle training process are reduced.

According to the multi-unmanned aerial vehicle auxiliary track planning method provided by the invention, the calculation unloading work of the ground user is completed jointly in the scene of the integration of the Internet of things, the unmanned aerial vehicle and the ground user equipment. The mode that adopts many unmanned aerial vehicles to cooperate to cover ground user has effectively improved work efficiency.

Example 2

According to an embodiment of the application, a track planning method based on federal learning in an unmanned aerial vehicle-assisted ground communication network is provided, and the method is applied to a multi-unmanned aerial vehicle computing and unloading system based on federal learning multi-agent deep reinforcement learning, and the system comprises the following steps: a Base Station (BS) equipped with a server, a drone (UAV) to which a small-sized on-board server is attached, a User Equipment (UE), wherein n= { n=1, 2,..n } represents a ground user equipment, m= {1,2,., M } represents a drone.

Federal Learning (FL) is used to train the overall algorithm framework, train the model on the edge drone, and learn the global model by aggregating local updates (i.e., gradients and weights of the model) to the cloud server.

The unmanned aerial vehicle device is used for carrying out calculation unloading coverage on the ground user device, and the airborne server and the communication device are configured to select the associated ground user device and carry out calculation unloading. And simultaneously, executing the steps by utilizing an own airborne server and matching with machine learning training and algorithm.

The base station is used for collecting unmanned aerial vehicle training models and carrying out global aggregation, a server is attached to the ground base station end, each unmanned aerial vehicle training model is collected, central aggregation is carried out on the ground base station end, the global model is obtained, the global model is finally issued to each unmanned aerial vehicle, and the optimal performance effect is finally obtained through iteration in sequence.

The ground users are used for serving ground applications, corresponding tasks are generated in each time slot, and corresponding calculation results are obtained through calculation processing and serve the users.

Fig. 2 is a track planning method based on federal learning in an unmanned aerial vehicle-assisted ground communication network according to an embodiment of the present application, as shown in fig. 2, the method includes the following steps:

step S202, initializing a system.

And initializing the communication environment, the ground user position and the initial position of the unmanned aerial vehicle, and calculating prior information such as channel gain, SINR and the like according to the initial values. In this embodiment, the unmanned plane is regarded as an agent, an interactive environment is created, and the initialized environment state is set to S ₀ 。

The embodiment assumes that N ground user devices are randomly distributed at the side length I _max In a square area of (1), there are M drones flying at a fixed altitude H over the target area to serve ground user devices, assuming each ground user deviceThere is one calculation task to be performed in each time slot (Ts) over T consecutive Ts, t= { t=1, 2. Each task can be executed by the ground user equipment or offloaded to one of the unmanned aerial vehicles, and the embodiment is used for offloading the decision variable z _n,m,t The definition is as follows:

wherein, when z _n,m,t When the time is=1, the ground user equipment n is associated with the unmanned aerial vehicle m at the time t, the task is unloaded to a server to which the unmanned aerial vehicle is attached, and when z _n,m,t When=0, the ground user n performs a task at the user equipment itself at time t, and no user association with the unmanned aerial vehicle is performed. Moreover, the present embodiment sets that the task of each ground user device per slot can be performed only in one place. Thus, the following can be expressed:

In each time slot t, the present embodiment assumes that each user equipment n has a computationally intensive task S that needs to be performed _n,t It is defined as:

wherein D is _n,t Representing the amount of data to be processed, F _n,t The total number of CPU cycles required to perform this task is described.

At each time slot t, each unmanned aerial vehicle can fly in any direction, and the maximum flying distance is set as d _max . In order to ensure the flight safety of the unmanned aerial vehicle in the flight process, a flight boundary is arranged to prevent the unmanned aerial vehicle from flying out of a task area. And the flight coordinate of the unmanned aerial vehicle is set as [ X ] _m,t ,Y _m,t ]。

Because this embodiment works cooperatively for multiple unmanned aerial vehicles, the distance between unmanned aerial vehicles needs to be obtained, which is expressed as follows:

wherein X is _m′，t And Y _m′，t Representing the abscissa and ordinate of adjacent unmanned aerial vehicles, respectively.

In order to avoid collision, the embodiment sets the minimum distance of the flight position between unmanned aerial vehicles in each time slot as R ^u . Thus, this embodiment can be expressed as follows:

wherein R is _m,m’,t Representing the distance between the drones.

If the ground user n decides to offload tasks to drone m, it must be within the coverage of drone m. Then there is the following expression:

wherein R is _max Is the maximum horizontal coverage radius of the unmanned plane, Z _n,m,t Representing unmanned aerial vehicle computational offloading decisions, R _n,m,t Representing the distance of the drone m relative to the ground user equipment n at time t.

The offload data rate may be expressed as:

where B is the bandwidth of the channel, pn is the transmission power of the user equipment n, ρ=g ₀ G ₀ /σ ² G0≡ 2.2846, G0 is the channel power gain at a reference distance of 1m, σ2 is the noise power, and H represents the unmanned aerial vehicle flight height. In order to focus on other steps of the unmanned aerial vehicle calculation unloading process, the embodiment does not consider any specific oneModulation and coding scheme.

Thus, if the ground user n decides to offload his task to the drone m in TS, the time required to offload the data is given by:

wherein D is _n,t Representing the amount of data to be processed by the ground user equipment n at time t, r _n,m,t Representing the offloading data rate of the ground user device n transmission to the drone m at time t.

And the execution time of the task can be expressed as:

wherein f _n,m,t Representing the computing power of the drone m that can be allocated to the ground user n, m=0 represents the local execution, fn, t represents the total number of CPU cycles required by the ground user device n to perform this task at time t. Thus, the total time required to perform a task can be described as

Wherein, the liquid crystal display device comprises a liquid crystal display device,

representing local computation delay,/- >

Representing the transmission delay when performing computation offload, +.>

And the time delay of the unmanned aerial vehicle end for calculation processing when the calculation unloading is executed is represented.

To ensure that the computing tasks of all the ground user equipment and the whole system execution can be in specificationCompletion in fixed time slots, the present embodiment also specifies a maximum task time T in each time slot _max . Therefore, there are the following:

wherein T is _n,m,t Representing the time required to complete the task within each time slot, T _max Representing the maximum task time within each slot.

If the user equipment n decides to perform the task locally, the energy consumption is given by:

wherein kn is greater than or equal to 0, vn is greater than or equal to 1 and is a positive coefficient,

representing the computing power of the user equipment n +.>

Indicating (I)>

Indicating the time required to complete the task within each time slot.

If the user equipment n decides to offload tasks, the offload energy consumption is:

where Pn denotes the transmission power,

representing the propagation delay in performing the computation offload.

Thus, the energy consumption at UE n can be expressed as:

then, the present embodiment defines c _m,t ∈[0,1]As unmanned plane m in t time slot relative to ground user equipment load:

the aim of the present embodiment is to minimize the total energy consumption of the user equipment by optimizing both the offloading decision and the drone trajectory. However, this may lead to an unfair process, as some drones may serve more user equipment than others. To solve this problem, the present embodiment looks like the fairness equation of Jain's function, which is to have a fairness index f _t ^u The application is as follows:

wherein f _t ^u Physically reflecting the fairness level between the drones, f if all the drones have similar user equipment loads from initial slot to slot t _t ^u Closer to 1. Where Cm, t' represents the loading of the drone m in t slots with respect to the ground user equipment.

Then, in order to avoid a case where some user equipments are served during many slots and other user equipments are not served at all, the present embodiment applies the geographic fairness equation as follows, similarly to the fairness equation of Jain's:

wherein f _t ^e Explicitly reflecting the fairness level between the user equipments, the value of f is closer if a similar number of slots is provided for all user equipments from the initial slot to slot tAt 1.

And step S204, establishing an unmanned aerial vehicle calculation unloading optimization model.

Based on the determined decision requirement and constraint conditions, the unmanned aerial vehicle calculation unloading optimization model is summarized as follows:

z _n,m,t T _n,m,t ＜T _max #(1e)

R _m,m′,t ＞R ^u #(1f)

z _n,m,t R _n,m,t ≤R _max #(1g)

wherein f _t ^u Indicating fairness among unmanned aerial vehicles, f _t ^e Indicating fairness between the user equipments,

representing the energy consumption of the ground user equipment, +.>

Representing unmanned plane movement energy consumption, +.>

Represents the unmanned aerial vehicle hover energy consumption,

representing unmanned aerial vehicle computational processing offload data, E _b Representing the maximum energy consumption of each drone for a single flight cycle,

representing the time spent by the unmanned aerial vehicle in the course of the federal learning training, +.>

The time consumed by uploading unmanned aerial vehicle local model parameters in the federal learning training process is represented, zn, m and t represent calculation unloading decisions,

L＝{x _m,t ,y _m,t ,m∈M,t∈T}。

the aim of the embodiment is to maximize the geographic fairness of ground users and the load fairness of unmanned aerial vehicles, and simultaneously minimize the energy consumption of user equipment. The condition (1 a) indicates that the ground user equipment calculation task is completely unloaded only to the unmanned aerial vehicle or is not unloaded. Condition (1 b) indicates that the computational tasks of a single user device will be computationally offloaded to at most one drone. Condition (1 c) represents a total energy consumption constraint of the drone in performing tasks including movement, hovering, calculation, during execution of the multi-drone calculation offload. To ensure that training is completed during one time slot, condition (1 d) represents ensuring one round of training time during Federal Learning (FL). The condition (1 e) specifies the maximum delay of the user equipment for computing task processing, which includes the total consumed time for computing offloading to the drone and the computing delay for local offloading by the ground user equipment. Because this scheme adopts many unmanned aerial vehicles to calculate the uninstallation in coordination, in order to prevent unmanned aerial vehicle collision each other, minimum interval distance between the unmanned aerial vehicle has been restricted to condition (1 f). To ensure the communication effect with the ground user, the condition (1 g) defines the maximum coverage of the calculation offloading of the drone and the ground user equipment.

Step S206, defining a state set, an action set and a reward function, and training the unmanned aerial vehicle calculation unloading optimization model.

In a distributed network, each unmanned aerial vehicle serves as an intelligent agent, and the working deployment of the unmanned aerial vehicle at the next moment can be established according to the current environment state. And constructing the Markov game. Establishing M intelligent agents and a group of states as

And a set of actions->

Is a characteristic environmental interaction. Wherein the state s _t The current state of each unmanned aerial vehicle and the related data of the ground user are represented as s _m,t There is->

Action a _t Representing the current state of each unmanned plane related action a _m,t There is->

Furthermore, each UAV is controlled by its dedicated agent. In each time slot, each agent obtains its private observed state data s _m,t And take its own action a _m,t And receive rewards r _m,t . The environment then updates the state and traverses to the new state.

The design of the reward function is primarily dependent on the utility that the drone can obtain from the new environment after executing the policy. At the same time, it is necessary to consider the capacity requirements of the unmanned aerial vehicle and the number of users it covers. If the communication capabilities provided by the drone do not meet their requirements, the drone will be penalized. If the drones cross the boundary or do not meet the safe distance limit between drones, then it also needs to be penalized.

Thus, the bonus function may be defined as

Wherein ψ is _f Represents a fairness coefficient, f _t ^u Indicating fairness among unmanned aerial vehicles, f _t ^e Indicating fairness between user equipments, ψ _E Representing the coefficients of the surface user equipment,

representing the energy consumption of the ground user equipment, p _m Represents a penalty when the unmanned aerial vehicle m flies out of a prescribed area or the distance between the unmanned aerial vehicle m and the unmanned aerial vehicle is small, q _m Indicating that the unmanned aerial vehicle is more than d away from the associated ground user equipment _max Penalty in time.

DRL can be considered a "deep" version of RL, which uses multiple DNNs as a function of Q value

An approximation of Q. Where Q (s, a) is the expected return in performing action a in state s. In a depth deterministic strategy gradient (DDPG) method, the Q value approximator Q with parameter θ can be updated by minimizing the following loss function _θ (s,a)：

Wherein E represents the desire, y is the target value, which can be determined by

y＝r+γQ _θ′ (s′,a′),a′～π _φ′ (s′)

Where s ' is the next state, a ' is an action selected from the target actor network pi phi ', Q _θ’ Is a target network that holds a fixed target y over multiple updates, r represents the actual prize value, and γ represents the impression factor.

In this embodiment, the depth deterministic strategy gradient method can achieve good performance in many applications. It has a fatal disadvantage that is a problem of overestimation, which leads to accumulated errors. In particular, such errors can result in an otherwise poor condition accepting an excessively high Q value, resulting in a highly non-optimal strategy.

To solve this problem, the present embodiment proposes a dual delay depth deterministic strategy gradient method (TD 3) that considers the interaction between strategy and value update in function approximation errors. Based on the depth deterministic strategy gradient method, the dual-delay depth deterministic strategy gradient applies the following three technologies to solve the overestimated deviation problem and achieves better performance than the similar technologies.

1) Truncated double Q learning (clipped double Q-learning). The dual delay depth deterministic strategy gradient learns two Q functions instead of one (and thus "twinning"), namely qθ1 and qθ2. Thus, both the reviewer network and the reviewer target network each contain two independent Q networks. In each update, the Q-target network with a smaller Q-value is selected as Q-target. Specifically, the target update for clipping double Q learning may be expressed as

Where r is the prize specified in this embodiment, and

is the output of the target strategy, gamma represents the fit factor.

2) Delay policy updates (delayed policy updates). The dual-delay depth deterministic strategy gradient is less frequent than the reviewer network updating the actor network and the target network. That is, unless the cost function of the model changes sufficiently, the model does not update the policy. These less frequent policy updates will result in lower variance in the value estimates and therefore should yield better policies. This allows the commentator network to become more stable and reduce errors before being used to update the actor network.

3) Target policy smoothing regularization (target policy smoothing regularization). A problem with deterministic strategies is that it may exceed the peak of the estimated value. Learning targets using deterministic strategies are susceptible to inaccuracy caused by function approximation errors when updating critic networks, thereby increasing the variance of the targets. This induced variance can be reduced by regularization. The dual delay depth deterministic strategy gradient adds noise (i.e., clipping random noise) to the target motion and averages small bins of size N to smooth the value estimate as follows

Wherein the added noise is clipped to keep the target close to the original motion, s' denotes the next moment state, epsilon denotes the motion noise,

the normal distribution variance is represented, c represents the normal distribution value boundary, and N represents the normal distribution. The update principle of the dual delay depth deterministic strategy gradient method is similar to that of the DDPG method and can be expressed as:

where s represents the current state and a represents the action.

Step S208, performing unmanned aerial vehicle track planning by utilizing federal learning.

Federal learning consists of two parts: a client that is the owner of the data and a server that is the learning system. In the scenario of the present embodiment, the client is M unmanned aerial vehicles, each The unmanned aerial vehicle has its own data set DSm, mε M, which can be expressed in terms of input and output pairs

Where xi represents the input sample vector, i.e. the current environmental state s that the drone M can obtain _m,t Yi represents the output value, i.e., the Q value for each action that the drone may take.

In addition to training the corresponding local model θm using the data set described above, the UAV also sends local model parameters to a cloud server that aggregates the local model of each unmanned aerial vehicle and generates a global model g= u _m∈M DS _m . In particular, in addition to local model training, the drone also transmits local model parameters. The training goal of this model is to minimize the global loss function:

wherein L (G) represents a global loss function, θ _m Representing a local model, and M represents the total number of drones.

It is worth noting that although there is also a server involved here, it only processes model parameters, which is very different from traditional centralized learning, in which the server needs to train all data d= u _m∈M DS _m A global model is generated.

In fact, the aggregation of information on a server is an important component of the overall Federal Learning (FL) system. Information processing and aggregation are based primarily on federal averaging methods. The server randomly selects a set of clients in each round of training to collect the weights and gradients of its local model, which are aggregated to generate the global model for that round. The cloud server then assigns weights and gradients to the clients participating in the training. These clients load new parameters into their local model for the next round of training and then transmit updated parameters to the server.

In this overall flow, the drone is the subject of Federal Learning (FL) client learning and training. Each drone is modeled as an agent that can be independently explored and learned. Based on deep reinforcement learning, a collaborative communication network is formed by a plurality of agents.

In other embodiments, the average field game theory (MFG) method can be combined with the Federal Learning (FL) scheme to complete the computation offload task of multiple unmanned aerial vehicles for the ground end user device in the face of clustered unmanned aerial vehicle path planning. And training an average field game theory method at the edge by using a federal learning framework and a base station as a federal learning center node and an unmanned aerial vehicle as federal learning edge node, and finally obtaining a path planning control method of a large-scale unmanned aerial vehicle by using federal learning to share parameters of a neural network.

Compared with the prior art, the method can achieve better performance effect compared with discrete actions by adopting continuous action space in the decision making process of the unmanned aerial vehicle. Secondly, the method adopts a training mode of federal learning sharing training model parameters, and compared with the existing multi-agent deep reinforcement learning training mode, the method has the advantages that all unmanned plane states, actions, rewarding values and next time states are interchanged, communication transmission delay and energy consumption are greatly reduced, and meanwhile privacy safety of edge ends is protected. Then, in the process of computing and unloading, the unmanned aerial vehicle can actively associate users, and select users with the minimum energy consumption according to the optimal function instead of selecting users, so that better balance can be achieved on fairness and energy consumption balance compared with the prior art. Finally, the method adopts a dual-delay depth deterministic strategy gradient method, and compared with the existing method, the method can more effectively solve the problem of overestimation of the Q function in reinforcement learning.

Simulation test

The invention sets the following parameters, the target area is set as a square area with the side length of lmax=100m, and 50 ground users are randomly distributed in the area. Three unmanned aerial vehicles are arranged to carry out calculation unloading service on the area, and initial coordinates of the three unmanned aerial vehicles are set as [10, 10 ]]，[90，90]And [10, 90 ]]. In addition, every time interval, every ground user will produceA single task occurs. Wherein the maximum upper line of computing power of an onboard server deployed on the unmanned aerial vehicle is 1.2GHz. Channel bandwidth B and noise power delta ² Set to 10MHz and-100 dBm, respectively. Task size D _n,t And F _n,t Respectively [10000,14000 ]]Byte sum [1800,2600 ]]Each bit of the cycle. The CPU frequency of each ground user device was 0.1GHz computing capacity. Furthermore, the critic network and the actor network are each composed of four hidden layers 350×250×200×200. The learning rates of the actor and critic networks are 1×10 respectively ^-4 And 5 x 10 ^-5 . The minimum batch size is 512, the discount factor of future rewards is 0.857, and the action noise is normally distributed.

The following comparison method is performed:

1) The track planning method for the multiple unmanned aerial vehicles comprises the following steps: and simultaneously deploying a plurality of unmanned aerial vehicles in the planning area to cooperatively perform user association and calculation unloading service on the ground user equipment.

Through performance analysis of the comparison method, the method and the device show fairness obtained when the unmanned aerial vehicle serves all the UEs, fairness of UE load of each unmanned aerial vehicle, and time delay and energy consumption consumed for transmitting related data in the training process.

First, the present application depicts a training curve of multi-agent deep reinforcement learning fused with federal learning in fig. 3, where 3 unmanned aerial vehicles are deployed to fly from an initial position to cover ground users while performing user association for computational offloading. As can be seen in fig. 3, the cumulative prize obtained initially is below 0, iterating through around 300 initial increases in prize value, again from around 1300. Convergence is achieved through about 2900 training sets.

Secondly, in the model training process, the time delay and the energy consumption consumed by data transmission are also important indexes for influencing training. The training method adopted by the multi-agent deep reinforcement learning method is that states, actions, rewarding values and next time states obtained through each exploration are simultaneously shared to all unmanned aerial vehicles and stored in a cache pool of each unmanned aerial vehicle, so that the training purpose of global sharing is achieved. In contrast, the multi-agent deep reinforcement learning fused with federal learning adopts a mode of periodically transmitting parameters of an edge training model to a central server for aggregation and then transmitting the parameters back to all edge unmanned aerial vehicles for global sharing training. The two modes have different training flows, and the time delay and the energy consumption consumed by transmitting the related training data are also different. The comparison of multi-agent deep reinforcement learning and multi-agent deep reinforcement learning fused with federal learning (as shown in fig. 4 and 5) can be seen that because the training mode of federal learning center aggregation is fused, in the training process, model parameters are only periodically transmitted to the center to be aggregated and then shared to all edge users, and compared with the method of multi-agent deep reinforcement learning that all training data are shared to all unmanned aerial vehicles in each iteration, the training time delay and energy consumption are saved.

After the training phase is finished, the model and the network parameters are saved for testing after the training phase is finished. Next, the present application analyzes the flight trajectory of multiple unmanned aerial vehicles while comparing the load fairness of the unmanned aerial vehicles with the geographic fairness of ground users.

The present application first describes a multi-agent deep reinforcement learning unmanned aerial vehicle trajectory planning method using fusion federal learning in fig. 6. In the figure, three unmanned aerial vehicles are deployed to calculate and unload a target area, lines represent flight tracks of the unmanned aerial vehicles, and red dots represent positions of ground users. As can be seen from fig. 6, all the drones move in a specific area. Because the coverage area of the unmanned aerial vehicle is limited, the fairness index of the service ground users must be improved by moving the position of the unmanned aerial vehicle. Furthermore, it can be seen that the unmanned aerial vehicles cooperatively perform calculation offloading tasks on the ground user equipment in the whole 100 x 100m target area, so as to maximize the defined reward value. For example, "UAV1" moves from an initial upper right corner position to a lower right corner, with more ground users unloading than the upper right corner lower right corner of the initial position, in order to serve more users. After completing the lower left hand computational offloading tasks, the "UAV2" moves to the center to offload users near the center to assist users in that area.

As can be seen from FIG. 7, fusion federal scienceConventional multi-agent deep reinforcement learning unmanned aerial vehicle track planning method and multi-agent deep reinforcement learning unmanned aerial vehicle track planning method are in user equipment geographic fairness f _t ^e In all aspects, as the time of flight increases. Finally, the application can obtain that when all the flight tasks are completed, the geographic fairness of the user equipment of the two methods can reach similar level.

Then, the present application demonstrates the load fairness of the unmanned aerial vehicle between the multi-agent deep reinforcement learning control method in fig. 8 and the multi-unmanned aerial vehicle cooperative control scheme that fuses the federal learning multi-agent deep reinforcement learning. Both schemes approach fairness of 1 and the performance is quite similar, as both solutions can control the drone to serve a similar number of user devices.

Example 3

According to an embodiment of the present application, there is further provided a track planning apparatus based on federal learning, as shown in fig. 9, including: an initialization module 92, a model training module 94, and a trajectory planning module 96.

The initialization module 92 is configured to initialize a communication environment in the unmanned aerial vehicle-assisted ground communication network, a ground user position, and an unmanned aerial vehicle initial position, and calculate a priori information of the unmanned aerial vehicle-assisted ground communication network based on the initialized initial values.

The model training module 94 is configured to build an overlay unmanned aerial vehicle computing offload optimization model based on the prior information and to solve the overlay unmanned aerial vehicle computing offload optimization model through a training process using deep reinforcement learning to obtain a deep reinforcement learning trained local network model.

The trajectory planning module 96 is configured to derive a globally optimal unmanned aerial vehicle trajectory planning scheme using federal learning iterative updates based on the local network model.

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiment 1 and embodiment 2, and this embodiment is not described herein.

In the embodiment, the track planning is trained by combining multi-agent deep reinforcement learning and federal learning, so that the training efficiency, the communication overhead and the performance effect are improved. In addition, in the decision process of the unmanned aerial vehicle for executing calculation unloading, tasks are executed in a continuous action space mode, and compared with discrete actions, the performance can be effectively improved.

Example 4

According to an embodiment of the present application, there is further provided an unmanned aerial vehicle assisted ground communication network, as shown in fig. 10, including: a Base Station (BS) equipped with a server, a drone (UAV) to which a small-sized on-board server is attached, a User Equipment (UE), wherein n= { n=1, 2,..n } represents a ground user equipment, m= {1,2,., M } represents a drone.

The base station is used for training a track planning scheme of multiple unmanned aerial vehicles, a federal learning framework is utilized, central aggregation is carried out on cloud servers attached to the base station end through collecting model parameters of the edge unmanned aerial vehicles, so that a global model is obtained, and the global model is issued to the edge unmanned aerial vehicles for iterative training.

The unmanned aerial vehicle is used for carrying out selective user association on the ground user equipment and carrying out calculation unloading service, and the unmanned aerial vehicle allocation server can be used for rapidly and effectively processing calculation tasks transmitted by the ground user, so that the effect of reducing the pressure of the ground user is achieved. Meanwhile, the model is trained by being used as the edge end of the federal learning training frame and is transmitted to the center for aggregation. Federal learning is used to train the overall algorithm framework, train the model on the edge drone, and learn the global model by summarizing local updates (i.e., gradients and weights of the model) to the cloud server.

The unmanned aerial vehicle is provided with the track planning device based on federal learning according to embodiment 3, so that a description thereof is omitted here.

The user equipment is configured to receive the perceived ground related information and generate a time delay sensitive task, and offload the perceived ground related information to the unmanned aerial vehicle end according to an unmanned aerial vehicle decision scheme, or optionally execute the computation processing or the local execution of the unmanned aerial vehicle end, where the unmanned aerial vehicle in this embodiment may implement the methods described in the foregoing embodiments 1 and 2, and this embodiment is not described herein again.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A track planning method based on federal learning, comprising:

initializing a communication environment, a ground user position and an initial position of the unmanned aerial vehicle in an unmanned aerial vehicle auxiliary ground communication network, and calculating prior information of the unmanned aerial vehicle auxiliary ground communication network based on an initial value obtained by initialization;

based on the prior information, establishing a coverage unmanned aerial vehicle calculation and unloading optimization model, and solving the coverage unmanned aerial vehicle calculation and unloading optimization model through a training process by utilizing deep reinforcement learning to obtain a local network model obtained through deep reinforcement learning training;

and based on the local network model, obtaining a globally optimal unmanned aerial vehicle track planning scheme by utilizing federal learning iterative updating.

2. The method of claim 1, wherein solving the overlay drone computational offload optimization model through a training process using deep reinforcement learning comprises performing the steps of:

Based on a preset state set, an action set and a reward function, the unmanned aerial vehicle selects actions according to the current state, and calculates a reward value of the current state of the unmanned aerial vehicle according to feedback of the current network environment;

acquiring a benefit estimation of the next state of the unmanned aerial vehicle, and updating the benefit of the current state according to the benefit estimation and the reward value of the current state;

and entering a next state from the current state, and taking the next state as the current state.

3. The method of claim 2, wherein the state set, action set, and reward function are defined by:

determining decision requirements and constraints for optimizing equipment in the unmanned aerial vehicle-assisted ground communication network;

a set of states, a set of actions, and a reward function for the drone in the device are defined based on the determined decision requirement and constraints.

4. A method according to claim 3, wherein the constraints comprise at least one of:

in the process of training the unmanned aerial vehicle calculation unloading optimization model, training time iterated once through the federal learning;

in the process of executing a task by the unmanned aerial vehicle, the unmanned aerial vehicle executes the total energy consumption of the task, wherein the task comprises at least one of the following: moving, hovering, and calculating and unloading;

The method comprises the steps that a ground user equipment in an unmanned aerial vehicle auxiliary ground communication network calculates and unloads total consumed time for the unmanned aerial vehicle;

and the ground user equipment performs calculation time delay of local unloading.

5. A method according to claim 3, wherein the decision requirement comprises at least one of: unmanned aerial vehicle load fairness, serving ground user geographic fairness, and ground user energy consumption are minimized.

6. The method of claim 5, wherein the bonus function is determined based on:

determining a geographic fairness index based on the serving ground user geographic fairness;

determining a load fairness index based on the unmanned aerial vehicle load fairness;

determining a minimum value of energy consumption of the surface user equipment based on the surface user energy consumption minimization;

the reward function is determined based on the geographic fairness index, the load fairness index, the minimum value of energy consumption, a penalty when the drone flies out of a prescribed area or a distance from the drone is less than a preset threshold, and a penalty when the drone is more than a preset distance threshold from an associated ground user device.

7. The method of claim 1, wherein obtaining a globally optimal unmanned aerial vehicle trajectory planning scheme using federal learning iterative updating based on the local network model comprises:

the parameters of the local network model are sent to a cloud server, wherein the cloud server collects the weights and gradients of the local network model, aggregates the local network models obtained by deep reinforcement learning training of each unmanned aerial vehicle, and generates a global network model;

planning a trajectory of the drone based on the global network model.

8. The method of claim 7, wherein after sending the parameters of the local network model to a cloud server, the method further comprises: updating the local network model on the drone based on the weights and gradients assigned by the cloud server.

9. Track planning device based on federal study, characterized by comprising:

the initialization module is configured to initialize a communication environment, a ground user position and an initial position of the unmanned aerial vehicle in the unmanned aerial vehicle auxiliary ground communication network, and calculate prior information of the unmanned aerial vehicle auxiliary ground communication network based on the initial value obtained by initialization;

The model training module is configured to establish a coverage unmanned aerial vehicle calculation unloading optimization model based on the prior information, and solve the coverage unmanned aerial vehicle calculation unloading optimization model through a training process by utilizing deep reinforcement learning to obtain a local network model obtained through deep reinforcement learning training;

and the track planning module is configured to obtain a globally optimal unmanned aerial vehicle track planning scheme by utilizing federal learning iterative updating based on the local network model.

10. An unmanned aerial vehicle assisted ground communication network comprising an unmanned aerial vehicle, a ground user equipment, and a base station, wherein the unmanned aerial vehicle comprises the apparatus of claim 9 thereon.