CN116980852A

CN116980852A - Multi-unmanned aerial vehicle assisted MEC system deployment and unloading strategy joint optimization method

Info

Publication number: CN116980852A
Application number: CN202310608196.1A
Authority: CN
Inventors: 余雪勇; 李元昊
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-10-31

Abstract

The application belongs to the technical field of computer wireless communication, and discloses a multi-unmanned aerial vehicle auxiliary MEC system deployment and unloading strategy combined optimization method, which comprises the steps of establishing a dynamic multi-unmanned aerial vehicle multi-ground terminal edge computing system model for real-time communication and data transmission; under the condition of giving the flight trajectory of the unmanned aerial vehicle, constructing an optimal unloading strategy; constructing a ground terminal unloading matching decision according to a given unmanned aerial vehicle flight track and the optimal unloading strategy; and optimizing the track of the multiple unmanned aerial vehicles according to the optimal unloading strategy and the ground terminal unloading matching decision. The optimization method provided by the application has good convergence, can reduce optimization variables, optimize the track of the three-dimensional unmanned aerial vehicle, and is more suitable for a real scene.

Description

Multi-unmanned aerial vehicle assisted MEC system deployment and unloading strategy joint optimization method

Technical Field

The application relates to the technical field of computer wireless communication, in particular to a multi-unmanned aerial vehicle auxiliary MEC system deployment and unloading strategy combined optimization method.

Background

Mobile Edge Computing (MEC) deploys servers to the network edge is considered an effective way to address fixed MEC server deployments. In MECs, mobile devices may offload their tasks to servers close to them. Compared to mobile cloud computing, MEC transmission distances are shorter, transmission time and energy consumption are less. Mobile Edge Computing (MEC) can serve compute-intensive and delay-sensitive services by configuring ubiquitous computing resources in the vicinity of user equipment, supporting a large number of devices and handling large amounts of data in time.

In recent years, unmanned aerial vehicles have received a great deal of attention in terms of wireless communication. Unmanned aerial vehicles have been widely studied and are considered as viable methods of assisting wireless communication networks, and the development of unmanned aerial vehicles has overcome the limitations of time of flight and battery power to some extent. For example, unmanned aerial vehicles have been applied to areas where communication infrastructure is limited, such as developing countries or mountainous areas, as well as seismic response, emergency rescue, and battlefield communications. Recently, there has been literature research into an unmanned MEC wireless system in which a MEC server is installed on an unmanned vehicle (i.e., a flying edge cloud). Such a system may provide two advantages: 1) Because the flying edge server is higher in height, better line-of-sight links can be provided for mobile users with higher probability; 2) The unmanned plane can be flexibly deployed, so that the transmission distance can be further shortened; 3) Their hover stability and LoS transmission characteristics provide a reliable and low-latency communication link for ground terminals.

In a multi-drone assisted wireless communication system, the drone typically plays the role of an air Base Station (BS) or an air mobile terminal. When the drone is used as an air base station, the ground terminal communicates with the drone over the LoS link. However, the large data transmission between the ground terminals and the drone may cause the channel to be blocked. Furthermore, unmanned aerial vehicles have a coverage area that is also defective. When the drones are used as airborne mobile terminals, an increase in the number of drones can overload the cellular network band. Furthermore, the drone will contend for limited spectrum resources with the ground terminals.

For the problem of a multi-unmanned aerial vehicle auxiliary edge computing system, the existing work mainly focuses on the problems of computing and unloading, resource allocation and unmanned aerial vehicle track optimization. The computation offloading can offload tasks to nearby MEC servers to provide service quality, optimization is usually combined with task scheduling and load balancing, meanwhile, the resource allocation optimization can reasonably allocate computation resources to the ground terminals, resource waste is reduced, the track optimization of the unmanned aerial vehicle can reduce time delay, energy is saved, communication throughput is improved, and better service quality is brought to the ground terminals.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.

The present application has been made in view of the above-described problems occurring in the prior art.

Therefore, the technical problems solved by the application are as follows: in the prior art, the problem that the overload of the unmanned aerial vehicle is caused by the condition that the ground terminal task is excessively concentrated by a single unmanned aerial vehicle is ignored, and the problem of optimizing the three-dimensional flight track of multiple unmanned aerial vehicles on the premise that the unmanned aerial vehicles process as many tasks as possible, and the energy consumption and delay as few as possible are ignored in a real scene.

In order to solve the technical problems, the application provides the following technical scheme: a multi-unmanned aerial vehicle assisted MEC system deployment unloading strategy joint optimization method comprises the following steps:

establishing a system model for calculating edges of a plurality of ground terminals by using a dynamic multi-unmanned aerial vehicle for real-time communication and data transmission;

under the condition of giving the flight trajectory of the unmanned aerial vehicle, constructing an optimal unloading strategy;

constructing a ground terminal unloading matching decision according to a given unmanned aerial vehicle flight track and the optimal unloading strategy;

and optimizing the track of the multiple unmanned aerial vehicles according to the optimal unloading strategy and the ground terminal unloading matching decision.

As a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: the edge computing system comprises K ground terminals and M unmanned aerial vehicles;

all unmanned aerial vehicles are equipped with small MEC servers for communication and computation;

definition of all ground terminals randomly distributed in { X ] _size ,Y _size On the plane of 0, all unmanned aerial vehicles are in { X } _size ,Y _size And H, flying in a three-dimensional space, and completing a flying task generated by the connected ground terminal in T time slots by all unmanned aerial vehicles.

As a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: in each time slot, the unmanned aerial vehicle completes tasks generated by the connected ground terminals;

defining the position and the task of the ground terminal in the next time slot, randomly updating the position and the task of the ground terminal in a certain range, reselecting the optimal unmanned aerial vehicle according to the position and the task of the ground terminal, and finishing the track design by flying the unmanned aerial vehicle from the starting point to the finishing point through a series of time slots and task processing;

in the t-th time slot, the position of the unmanned plane m is wherein X_m (t) is the X-axis position, Y of unmanned plane m _m (t) is the Y-axis position, Z of the unmanned plane m _m (t) } is the Z-axis position of the drone m;

the position of the ground terminal k is

The data size of the k task of the ground terminal is D _k (t)；

The CPU period number required by the ground terminal k to process 1bit data is F _k (t) the ground terminal k has the task of

As a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: the total energy consumption of the system is calculated on the basis of the fairness of the unmanned aerial vehicle auxiliary edge in a time slot t, and is expressed as:

wherein ,for the transmission energy consumption of ground terminal k and unmanned plane m, < >>For the calculation of the energy consumption of ground terminal k, +.>For the calculation energy consumption of unmanned plane m, +.>The flight energy consumption of the unmanned aerial vehicle m is shown as I (t), the fairness index among the unmanned aerial vehicles is shown as I (t), and omega is the weight of the flight energy consumption of the unmanned aerial vehicle.

As a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: the total energy consumption optimization problem for fairness based systems is expressed as:

s.t.C1:

C2:

C3:

C4:

C5:

wherein , the objective function is to minimize the total system energy consumption of the unmanned aerial vehicle for completing tasks, C1 is the range constraint of unmanned aerial vehicle and ground terminal movement, C2 is the data volume unloading proportion constraint of an unloading strategy, C3 is the safety interval constraint of the unmanned aerial vehicle, C4 is the unloading matching decision of the ground terminal and the unmanned aerial vehicle, all the ground terminals respectively need to be unloaded and matched with one unmanned aerial vehicle, and C5 is the fairness index constraint among the unmanned aerial vehicles, and the closer to 1, the fairer.

As a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: the optimal unloading strategy is a task amount unloading proportion and comprises the following steps:

according to the given unmanned plane track theta in the time slot t, the unloading matching decision of the fixed ground terminalAnalyzing the concave-convex performance of the optimal strategy ψ, wherein the total energy consumption of the system is simplified as follows:

wherein ,is a fairness coefficient between unmanned aerial vehicles, +.>For the power factor> and />The energy consumption coefficients are respectively transmitted and calculated by the ground terminal and calculated by the unmanned aerial vehicle terminal; />Is the flying power coefficient of the unmanned plane, +.> and />The time delay coefficients are respectively transmitted and calculated by the ground terminal and calculated by the unmanned aerial vehicle terminal;

when the total energy consumption of the system is minimum,

the optimal offloading strategy for offloading the ground terminal k to the drone m is expressed as:

as a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: the ground terminal offloading matching decision comprises:

giving a t-time slot unmanned plane track Θ, and analyzing an unloading matching decision of the ground terminal by obtaining an optimal unloading strategy ψ of the ground terminal k, wherein the total energy consumption of the system is simplified as follows:

wherein r is the Euclidean distance d _k,m A transmission data rate function of (t), and />Respectively a fixed unloading strategy psi and a constant after the unmanned plane track theta;

defining the offload matching decisions for K ground terminals is expressed as:

the offload match decisions for other ground terminals than ground terminal k are expressed as:

offloading match decisions for ground terminalsInitializing to select the nearest unmanned aerial vehicle, and repeating the following steps:

for ground terminals k (k e 1, K]) Protecting and protectingCalculating the current optimal selection delta of the ground terminal k without changing the unloading matching decision of other ground terminals _k And

when (when)When modifying delta _k For optimal selection of ground terminal k, update +.> Until no ground terminal proposes a better choice of itself;

when in Nash equilibrium, the ground terminal's offload match decision is best and E minimizes in time slot t.

As a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: each unmanned plane is regarded as an intelligent body, and the environment model is described asDefining a Markov decision process including states, behaviors, transition probabilities, rewards and initial states;

the state is composed of the state of each agent and ground terminal, including: the position of the unmanned aerial vehicle, the position of the ground terminal, the data size of the ground terminal and the CPU cycle number required by the ground terminal for matching the unmanned aerial vehicle m to calculate 1bit data;

the state of agent m can be expressed as:

the acts include defining a horizontal deflection angle and a vertical deflection angle as the acts of each agent, expressed as:

wherein ,

the behavior m of the agent is normalized, expressed as:

wherein, 3 unmanned aerial vehicle's horizontal deflection angleThe ranges are respectively as follows: />[0,π]，/>Vertical deflection angle of 3 unmanned aerial vehicle>

The transition probability is expressed as:

representing according to the behaviour a= [ a ] ₁ ,…,a _M ]Slave state s= [ s ] ₁ ,…s _M ]To the next state s' = [ s ] ₁ ′,…s′ _M ]Is a transition probability of (2);

the rewarding comprises the steps of defining the sum of energy consumption of all the agents in T time slots as rewarding according to an objective function on the premise of ensuring fairness and an optimal unloading strategy;

the rewards are expressed as:

the penalty rewards are expressed as:

the initial state is that each unmanned aerial vehicle is assumed to complete the flight path from the starting point to the end point, and then the unmanned aerial vehicle returns to the starting point for training until rewards are converged.

As a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: a framework for centralized training and distributed execution, comprising:

the transition probability is expressed as:

P(s′|s,a,μ)＝P(s′|s,a)＝P(s′|s,a,μ′)

wherein μ= [ μ ] ₁ ,…,μ _M ]Represents deterministic policies of M agents in actor policy network, μ ' = [ μ ' ' ₁ ,…,μ′ _M ]Deterministic policies representing M agents in a target policy network using θ= [ θ ] ₁ ,…,θ _M ]A parameter representing a deterministic policy μ in the actor policy network;

the cumulative expected prize for agent m is expressed as:

where D represents an empirical replay buffer including { s, a, r, s', done }, r= [ r ] ₁ ,…,r _M ]Is a set of rewards for all agents, gamma represents a rewards discount factor;

the strategy gradient of deterministic strategy μ is expressed as:

wherein ,the centralized action value function output representing a critic network whose inputs are the status and actions of all agents,/->For evaluating the quality of an actor network output strategy, updating a critic strategy network by minimizing a loss function>

The loss function is expressed as:

wherein the target valuea′＝[μ′ ₁ (s ₁ ′),…,μ′ _M (s′ _M )]Is a behavior set of M agents, +.>A target network representing a set based on a deterministic strategy μ 'with a delay parameter θ' = [ θ ] ₁ ′,…,θ′ _M ]；

The updating mode of the delay parameter theta' is as follows:

θ′ _m ←τθ _m +(1-τ)θ′ _m

wherein τ is a soft update coefficient;

training one by U differentThe set of policy components is expressed asFor agent m, the cumulative expected prize update is:

the strategy gradient is updated as follows:

as a preferable scheme of the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, the application comprises the following steps: optimizing a multi-unmanned aerial vehicle trajectory, comprising:

initializing actor policy network μ and target policy network μ', initializing playback memory for storing experiences of agents to rpm, generating a random processFor action exploration, for each agent m, selecting actions according to a Markov decision process, inputting the states s and actions a of all agents, obtaining rewards r and next states s 'of the agents, storing (s, a, r, s', done) in an experience storage pool rpm, sampling a random batch from an experience playback poolSetting a target value and minimizing a loss function to update the critic network;

the target value is expressed as:

the minimization loss function is expressed as:

updating the actor policy network gradient is expressed as:

the delay parameter of the target network of each agent m is soft updated to be theta', and the rewarding value of each unmanned plane reaching the end point is expressed as:

when the unmanned plane m flies out of the boundary or flies within the safe interval, the reward value is updated as follows:

and repeating the optimizing step, and obtaining the optimal unmanned aerial vehicle track with minimum energy consumption when the maximum iteration number is reached.

The application has the beneficial effects that: according to the multi-unmanned aerial vehicle auxiliary MEC system deployment unloading strategy joint optimization method, a dynamic multi-unmanned aerial vehicle auxiliary edge computing system is used, an auxiliary unmanned aerial vehicle is utilized to share a main unmanned aerial vehicle computing task, and system tasks are completed under the premise of being fairer and lower in energy consumption within the ground terminal task tolerance time. The optimization method provided by the application has good convergence, can reduce optimization variables, optimize the track of the three-dimensional unmanned aerial vehicle, and is more suitable for a real scene.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

fig. 1 is an overall flowchart of a multi-unmanned aerial vehicle assisted MEC system deployment and offloading policy joint optimization method according to an embodiment of the present application;

fig. 2 is a system model diagram of a multi-unmanned aerial vehicle assisted MEC system deployment and offloading policy joint optimization method according to an embodiment of the present application;

fig. 3 is a comparative experimental diagram of a multi-unmanned aerial vehicle assisted MEC system deployment and offloading policy joint optimization method according to an embodiment of the present application.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present application can be understood in detail, a more particular description of the application, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

While the embodiments of the present application have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the application. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Also in the description of the present application, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.

Example 1

Referring to fig. 1-2, for a first embodiment of the present application, the embodiment provides a multi-unmanned aerial vehicle assisted MEC system deployment and offloading policy joint optimization method, which is characterized by comprising:

s1: establishing a system model for calculating edges of a plurality of ground terminals by using a dynamic multi-unmanned aerial vehicle for real-time communication and data transmission;

further, as shown in fig. 2, a dynamic multi-unmanned aerial vehicle multi-ground terminal edge computing system model for real-time communication and data transmission is established, and the model comprises K ground terminals, wherein the K ground terminals are integrated intoAnd M unmanned aerial vehicles, which are assembled as +.>All unmanned aerial vehicles are equipped with small for communication and computationA MEC server;

definition of all ground terminals randomly distributed in { X ] _size ,Y _size On the plane of 0, all unmanned aerial vehicles are in { X } _size ,Y _size Flying in three-dimensional space of H, completing the flying task generated by the connected ground terminals in T time slots by all unmanned aerial vehicles, and collecting the flying tasks into

It should be noted that there are three kinds of unmanned aerial vehicles, namely a main unmanned aerial vehicle 1, a main unmanned aerial vehicle 2 and an auxiliary unmanned aerial vehicle, and the main unmanned aerial vehicle is responsible for communication and calculation with most ground terminals, and has a fixed starting point and an end point. The auxiliary unmanned aerial vehicle serves a small number of ground terminals, the pressure of the main unmanned aerial vehicle is shared, and better load fairness is achieved among all unmanned aerial vehicles. The primary and secondary drones have the same structure, but their respective service objects and flight trajectories are different.

Further, in each time slot, the drone completes the tasks generated by the connected ground terminals;

in the t-th time slot, the position of the unmanned plane m is

The position of the ground terminal k is

The data size of the k task of the ground terminal is D _k (t)；

Further, the data transmission rate between the ground terminal k and the unmanned aerial vehicle m is expressed as:

wherein B is channel bandwidth, P _k For each mobile user's transmit power, beta ₀ For channel gain at reference distance, G ₀ Is a normal number, N ₀ Is the noise power spectral density;

the transmission delay when the ground terminal k communicates with the unmanned aerial vehicle m is expressed as:

wherein, in the t time slot,the task unloading proportion for the communication between the ground terminal k and the unmanned aerial vehicle m is provided;

the transmission energy consumption of the ground terminal k and the unmanned aerial vehicle m is expressed as:

the calculation delay of the ground terminal k is expressed as:

wherein ,f_k (t) is the calculated frequency of ground terminal k;

the calculated energy consumption of the ground terminal k is expressed as

wherein ,K_gd Representing the CPU capacitance coefficient of the ground terminal;

the computation delay of unmanned plane m is expressed as:

wherein ,f_k,m (t) calculating the calculation resources of the ground terminal k for the unmanned aerial vehicle m;

the calculated energy consumption of the unmanned plane m is expressed as:

wherein ,K_uav The CPU capacitance coefficient of the unmanned aerial vehicle;

the flight delay of the unmanned plane m in the time slot t is expressed as:

wherein ,K′_m Total number of ground terminals, P, serving unmanned plane m ^fly (t) is the flight power of the unmanned aerial vehicle;

the flight energy consumption of the unmanned plane m is expressed as:

the average workload of the drone m connected to the ground terminal is expressed as:

the fairness index between unmanned aerial vehicles is expressed as:

the total energy consumption of the system is calculated on the basis of the fairness of the unmanned aerial vehicle auxiliary edge in a time slot t, and is expressed as:

wherein ω is the weight of the unmanned aerial vehicle flight energy consumption.

S2: under the condition of giving the flight trajectory of the unmanned aerial vehicle, constructing an optimal unloading strategy;

still further, the total energy consumption optimization problem for fairness-based systems is expressed as:

s.t.C1:

C2:

C3:

C4:

C5:

wherein , the objective function is to minimize nothingThe total system energy consumption of the human-machine completion task, C1 is the range constraint of unmanned aerial vehicle and ground terminal movement, C2 is the data volume unloading proportion constraint of an unloading strategy, C3 is the unmanned aerial vehicle safety interval constraint, C4 is the ground terminal and unmanned aerial vehicle unloading matching decision, wherein all ground terminals respectively need to be unloaded and matched with one unmanned aerial vehicle, and C5 is the fairness index constraint among unmanned aerial vehicles, and the closer to 1, the fairer.

Further, the optimal offloading policy is a task amount offloading ratio, including:

wherein ,is a fairness coefficient between unmanned aerial vehicles, +.>For the power factor> and />The energy consumption coefficients are respectively transmitted and calculated by the ground terminal and calculated by the unmanned aerial vehicle terminal; />Is free ofFlying power coefficient of man-machine, < >> and />The time delay coefficients are respectively transmitted and calculated by the ground terminal and calculated by the unmanned aerial vehicle terminal;

it should be noted that E (t) is a linear function with respect to ψ and can be derived at extreme values. and />Respectively about->An increasing function or a decreasing function of (c).

When the total energy consumption of the system is minimum,

s3: constructing a ground terminal unloading matching decision according to a given unmanned aerial vehicle flight track and the optimal unloading strategy;

further, the ground terminal offloading matching decision comprises:

it should be noted that the offloading matching decision of the ground terminals is embodied in ensuring that all ground terminals can choose to offload tasks to the appropriate drone, and how many ground terminals choose the drone 1, the drone 2 or the auxiliary drone.

Since the number of ground terminals is constant in the communication interaction, i.eThe sum of the number of elements in (c) is constant but the combination of elements is variable, while the euclidean distance d (t) from the ground terminal to the drone is an important factor affecting the ground terminal offloading matching decisions.

Further, defining the offload matching decisions for K ground terminals is expressed as:

it should be noted that for any ground terminal, when in Nash equilibrium, if Δ _k Changes and is provided withThe energy consumption value E does not become small, as is the case. This is because if the offload match decisions of other ground terminals remain unchanged, then one ground terminal cannot break the nash equalization anyway to change the offload match decisions.

Further, offloading match decisions for ground terminalsInitializing to select the nearest unmanned aerial vehicle, and repeating the following steps:

for ground terminals k (k e 1, K]) Keeping the unloading matching decision of other ground terminals unchanged, and calculating the current optimal selection delta of the ground terminal k _k And

S4: and optimizing the track of the multiple unmanned aerial vehicles according to the optimal unloading strategy and the ground terminal unloading matching decision.

Further, consider each unmanned aerial vehicle as an agent, and the environmental model is described asDefining a Markov decision process including states, behaviors, transition probabilities, rewards and initial states;

the state of agent m can be expressed as:

it should be noted that in different time slots, the above four states are all changing, which means that the ground terminal is moving and generating a new task, which is more consistent with a real scenario.

The behavior includes defining a horizontal deflection angle and a vertical deflection angle as behavior of each agent, expressed as:

wherein ,

the behavior m of the agent is normalized, expressed as:

/>

The transition probability is expressed as:

the rewarding comprises defining the sum of the energy consumption of all the agents in T time slots as rewarding according to an objective function on the premise of ensuring fairness and an optimal unloading strategy;

it should be noted that, in order to reflect the rationality of the incentive, a negative value of the energy consumption is defined as the incentive.

The awards are expressed as:

the penalty rewards are expressed as:

the initial state is that each unmanned aerial vehicle is assumed to complete the flight path from the starting point to the end point, and then returns to the starting point for training until the rewards are converged.

Still further, a framework for centralized training and distributed execution is employed, comprising:

the transition probability is expressed as:

P(s′|s,a,μ)＝P(s′|s,a)＝P(s′|s,a,μ′)

the cumulative expected prize for agent m is expressed as:

the strategy gradient of deterministic strategy μ is expressed as:

The loss function is expressed as:

/>

The update mode of the delay parameter theta' is as follows:

θ′ _m ←τθ _m +(1-τ)θ′ _m

wherein τ is a soft update coefficient;

training a set of U different strategies expressed asFor agent m, the cumulative expected prize update is:

the strategy gradient is updated as follows:

it should be noted that during the training phase, the critic network of each agent gathers the states and behaviors of all agents and generates the Q value, but the actor network of each agent makes decisions based on its own partial states. The critic network is extended to learn the policies of other agents, so each agent performs a function that approximates the policies of other agents.

Still further, optimizing a multi-drone trajectory, comprising:

the target value is expressed as:

the minimization loss function is expressed as:

updating the actor policy network gradient is expressed as:

and repeating the optimization steps, and obtaining the optimal unmanned aerial vehicle track with minimum energy consumption when the maximum iteration number is reached.

Example 2

Referring to fig. 3, for one embodiment of the present application, the embodiment provides a multi-unmanned aerial vehicle assisted MEC system deployment and offloading policy joint optimization method, and in order to verify the beneficial effects of the present application, scientific demonstration is performed through a comparative experiment.

As shown in fig. 3, by comparing with the existing unmanned aerial vehicle control method, the unmanned aerial vehicle controlled by the deployment unloading strategy designed by the application optimizes the unloading decision of the ground terminal, the unloading task quantity proportion of the ground terminal and the flight track of the unmanned aerial vehicle, can process more tasks, can realize the minimum energy consumption of the system only by the deep reinforcement learning optimization algorithm in practical application, and has strong practicability.

It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.

Claims

1. The multi-unmanned aerial vehicle assisted MEC system deployment and unloading strategy joint optimization method is characterized by comprising the following steps of:

2. The multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 1, wherein: the edge computing system comprises K ground terminals and M unmanned aerial vehicles;

3. The multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 2, wherein: in each time slot, the unmanned aerial vehicle completes tasks generated by the connected ground terminals;

in the t-th time slot, the position of the unmanned plane m is

The position of the ground terminal k is

The data size of the k task of the ground terminal is D _k (t)；

4. The multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 3, wherein: the total energy consumption of the system is calculated on the basis of the fairness of the unmanned aerial vehicle auxiliary edge in a time slot t, and is expressed as:

wherein ,for the transmission energy consumption of ground terminal k and unmanned plane m, < >>For the calculation of the energy consumption of the ground terminal k,for the calculation energy consumption of unmanned plane m, +.>The flight energy consumption of the unmanned aerial vehicle m is shown as I (t), the fairness index among the unmanned aerial vehicles is shown as I (t), and omega is the weight of the flight energy consumption of the unmanned aerial vehicle.

5. The multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 4, wherein: the total energy consumption optimization problem for fairness based systems is expressed as:

wherein , the objective function is to minimize the total system energy consumption of the unmanned aerial vehicle for completing tasks, C1 is the range constraint of the unmanned aerial vehicle and the movement of the ground terminal, C2 is the data volume unloading proportion constraint of an unloading strategy, C3 is the safety interval constraint of the unmanned aerial vehicle, C4 is the unloading matching decision of the ground terminal and the unmanned aerial vehicle, wherein all the ground terminals respectively need to be selected to be unloaded and matched with one unmanned aerial vehicleAnd the man-machine, C5 is fairness index constraint among unmanned aerial vehicles, and the closer to 1, the fairer is.

6. The multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 5, wherein: the optimal unloading strategy is a task amount unloading proportion and comprises the following steps:

wherein ,is a fairness coefficient between unmanned aerial vehicles, +.>For the power factor> and />The energy consumption coefficients are respectively transmitted and calculated by the ground terminal and calculated by the unmanned aerial vehicle terminal; />Is the flying power coefficient of the unmanned plane，/> and />The time delay coefficients are respectively transmitted and calculated by the ground terminal and calculated by the unmanned aerial vehicle terminal;

when the total energy consumption of the system is minimum,

7. the multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 6, wherein: the ground terminal offloading matching decision comprises:

defining the offload matching decisions for K ground terminals is expressed as:

8. The multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 7, wherein: each unmanned plane is regarded as an intelligent body, and the environment model is described asDefining a Markov decision process including states, behaviors, transition probabilities, rewards and initial states;

the state of agent m can be expressed as:

wherein ,

the behavior m of the agent is normalized, expressed as:

The transition probability is expressed as:

representing according to the behaviour a= [ a ] ₁ ,…,a _M ]Slave state s= [ s ] ₁ ,…s _M ]To the next state s '= [ s ]' ₁ ,…s′ _M ]Is a transition probability of (2);

the rewards are expressed as:

the penalty rewards are expressed as:

9. The multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 8, wherein: a framework for centralized training and distributed execution, comprising:

the transition probability is expressed as:

P(s′|s,a,μ)＝P(s′|s,a)＝P(s′|s,a,μ′)

the cumulative expected prize for agent m is expressed as:

the strategy gradient of deterministic strategy μ is expressed as:

The loss function is expressed as:

wherein the target valuea′＝[μ′ ₁ (s′ ₁ ),…,μ′ _M (s′ _M )]Is a behavior set of M agents, +.>Representing a target network based on a set of deterministic strategies μ ', its delay parameter is θ ' = [ θ ] ' ₁ ,…,θ′ _M ]；

The updating mode of the delay parameter theta' is as follows:

θ′ _m ←τθ _m +(1-τ)θ′ _m

wherein τ is a soft update coefficient;

the strategy gradient is updated as follows:

10. the multi-unmanned aerial vehicle assisted MEC system deployment offloading policy joint optimization method of claim 8 or 9, wherein: optimizing a multi-unmanned aerial vehicle trajectory, comprising:

the target value is expressed as:

the minimization loss function is expressed as:

updating the actor policy network gradient is expressed as:

soft updating the delay parameter of the target network of each agent m to θ ^′ The prize value for each drone to reach the endpoint is expressed as:

and repeating the optimizing step, and obtaining the optimal unmanned aerial vehicle track with minimum energy consumption when the maximum iteration times are reached.