CN114513814A

CN114513814A - Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node

Info

Publication number: CN114513814A
Application number: CN202210079544.6A
Authority: CN
Inventors: 鲍宁海; 高鹏雷; 陈奎
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-05-17

Abstract

The invention discloses a dynamic optimization method for computing resources of an edge network based on unmanned aerial vehicle auxiliary nodes, and belongs to the technical field of communication. Aiming at the problems that server computing resources are insufficient and task unloading quality is deteriorated due to sudden local user traffic in an edge network cell, a computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node self-adaptive cruise is provided. According to the position distribution and task unloading requirements of ground users, a deep reinforcement learning method is adopted to dynamically plan the cruising track of the unmanned aerial vehicle, and the utilization rate of server resources of unmanned aerial vehicle nodes and base station nodes in the cruising process is maximized through a task unloading scheduling strategy, so that the task interruption rate of local users is effectively reduced, and the average task unloading delay is reduced.

Description

Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node

Technical Field

The invention belongs to the technical field of communication, and particularly relates to a dynamic optimization method for computing resources of an edge network based on an unmanned aerial vehicle auxiliary node.

Background

With the popularization and development of mobile networks, novel applications such as augmented reality, virtual reality and automatic driving are emerging continuously, and the daily life of people is greatly enriched. However, these applications are generally time-delay-demanding and also consume a large amount of computing resources, and it is difficult for a mobile terminal to achieve fast and efficient processing for such applications. The mobile edge computing can provide computing resources required by task unloading for the user in a short distance by sinking the cloud resources to the edge network, and effectively shortens the transmission delay between the user and the cloud server.

However, rapid changes in the distribution of terrestrial users and random traffic bursts of local area users may cause huge pressure on the fixed server resources of the edge network, resulting in situations of low utilization of computing resources and deteriorated user service experience. Therefore, the low-altitude unmanned aerial vehicle is used as an auxiliary node of the edge computing network, flexible resource supplement is provided for ground nodes, and the method becomes an important mode for future network construction and development.

The invention provides a dynamic computing resource optimization method based on unmanned aerial vehicle auxiliary node adaptive cruise, aiming at the problems of server computing resource shortage and task unloading quality deterioration caused by local user traffic burst in an edge network cell, so that the task interruption rate of local users is effectively reduced, and the average task unloading delay is reduced.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. An edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary nodes is provided. The technical scheme of the invention is as follows:

an edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary nodes comprises the following steps:

101. constructing a discrete time-state model according to a Markov decision process, comprising the steps of dispersing the cruising time of the unmanned aerial vehicle into time slots, and setting a time slot variable k and a ground-air network state vector s_kUnmanned aerial vehicle three-dimensional motion vector a_kUnmanned aerial vehicle action reward function r_kWherein s is_k，a_k，r_kCorresponding transition and change are carried out along with the increase of the time slot number k, and the initialized time slot variable k is 0;

102. the method comprises the steps that an unmanned aerial vehicle controller is used as an intelligent agent, a depth reinforcement learning model is constructed on the basis of a double-delay depth certainty strategy gradient algorithm idea, and the method comprises the steps of establishing a system environment collector, an unmanned aerial vehicle action strategy network pi, an unmanned aerial vehicle state-action value network Q, a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E and a random sample set Mini-Batch;

103. and if the three-dimensional coordinate position of the unmanned aerial vehicle does not change in the continuous n time slots, jumping to step 106, otherwise, determining a user object set I of the unmanned aerial vehicle j according to the effective coverage range of the unmanned aerial vehicle j_jUser object set I of base station o_o＝I-I_jWherein I represents the whole user object set, and is obtained through a task scheduling strategy generator_jAnd I_oTask offload decision variable set

And

jumping to step 104;

104. according to

And

executing task unloading request of user i, and obtaining corresponding reward value r through unmanned aerial vehicle action reward generator_kAcquiring a k time slot unmanned aerial vehicle three-dimensional motion vector a through an unmanned aerial vehicle motion strategy network pi_kSpace-ground network state vector s of k time slots_kAnd motion vector a_kCalculating to obtain s_k+1Will [ s ]_k,a_k,r_k,s_k+1]Storing the experience sample into an experience sample storage area E;

105. randomly sampling from an experience sample storage area E to obtain a Mini-Batch sample set, respectively importing the Mini-Batch sample set into an action strategy network pi and a state-action value network Q for training, and jumping to step 103;

106. the algorithm ends.

Go toStep(s) of constructing a discrete time-state model according to a Markov decision process in step (101), wherein a ground-to-air network state vector s of k time slots_kUnmanned aerial vehicle three-dimensional motion vector a_kUnmanned aerial vehicle action reward function r_kAs shown in formulas (1), (2) and (3):

in the formula (1), the first and second groups,

representing the three-dimensional coordinate position of drone j in k time slot,

representing the two-dimensional coordinate position of the user i in the k time slot; in the formula (2), the first and second groups,

indicating the horizontal direction of motion of drone j in k slots,

representing the vertical movement distance of the unmanned plane j in the k time slot; in the formula (3), ω represents the weighting factor of the unmanned aerial vehicle action reward function, ω ∈ (0,1), Δ t represents the time slot size,

represents the average unit task delay of k time slot user i, as shown in equation (4),

the average unit task time delay of the user i representing the k time slot meets the average unit task tolerance time delay tau_iOtherwise is

As shown in equation (5):

in the formula (4), the first and second groups,

representing the connection state of the user i and the unmanned plane j, and if the user i unloads the task to the unmanned plane j for execution in the k time slot

Otherwise

Representing the connection state of the user i and the base station o, if the user i unloads the task to the base station o for execution in the k time slot

Otherwise

User i can only be connected with one unmanned aerial vehicle or base station at most in k time slot, namely

Indicating the amount of tasks user i offloads to drone j in k slots,

indicating the amount of tasks, τ, that user i offloads to base station o in k slots_iAnd indicating the average unit task tolerance time delay of the user i.

Further, the step 102 is to construct a deep reinforcement learning model based on the idea of a double-delay deep deterministic strategy gradient algorithm, and includes establishing a system environment collector, an unmanned aerial vehicle action strategy network pi, an unmanned aerial vehicle state-action value network Q, a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E, and a random sample set Mini-Batch, which specifically includes:

the system environment collector is used for collecting two-dimensional coordinate positions of ground users in a k time slot ground-air network

User task unloading request and three-dimensional coordinate position of unmanned aerial vehicle

And the remaining available computing resources of the drone; unmanned aerial vehicle action strategy network pi generation k time slot ground-air network state s_kThree-dimensional motion vector a of lower unmanned aerial vehicle_k(ii) a Generation of k-slot ground-air network state s by unmanned aerial vehicle state-action value network Q_kLower execution unmanned aerial vehicle three-dimensional motion vector a_kThe action evaluation value q of (1); the task scheduling strategy generator is used for generating a k time slot user unloading strategy and obtaining a task unloading decision variable set

And

the unmanned aerial vehicle action reward generator generates an action reward value r of an unmanned aerial vehicle j in a k time slot after finishing an unloading task in the k time slot_k(ii) a Unmanned aerial vehicle carries out a_kThe ground-to-air network state after the action is composed of_kIs transferred to s_k+1(ii) a K time slot experience sample [ s ] is added in the experience sample storage area E_k,a_k,r_k,s_k+1](ii) a The random sample set Mini-Batch is composed of an empirical sample storage area E which randomly extracts a fixed number of samples. The unmanned aerial vehicle action strategy network pi and the unmanned aerial vehicle state-action value network Q are both neural networks and respectively comprise a plurality of hidden layers, and each hidden layer comprises a plurality of neurons.

Further, the task scheduling policy generator in step 103 decides a set of task offload variables of the user

And

the method comprises the following steps:

1) adding user I in effective coverage area of unmanned aerial vehicle j into unmanned aerial vehicle j service object set I_jLet us order

Base station o service object set I_o＝I-I_jLet us order

Are respectively to I_jAnd I_oUser i according to

Arranging in descending order;

2) offloading latency according to user i task

Calculation of I_jWorkload offloaded by user i to drone j

3) Offloading latency according to user i task

Calculation of I_oWorkload for user i offloading to base station o

Further, the task amount unloaded by the user i to the unmanned aerial vehicle j in the step 2)

The calculation method of (2) is shown in the formulas (6) and (7):

wherein the content of the first and second substances,

denotes the computing resources allocated by k-slot drone j for user i, C_jRepresenting the total amount of computing resources for drone j,

represents the uplink transmission rate from the k-slot user i to the unmanned plane j, F represents the size of the task unit,

representing the task complexity of user i.

Further, the task amount of the user i to be offloaded to the base station o in the step 3)

The calculation method of (2) is shown in the formulas (8) and (9):

wherein the content of the first and second substances,

indicating the computing resources allocated by the k-slot base station o to user i, C_oRepresenting the total amount of computational resources of the base station o,

indicating the uplink transmission rate of k-slot user i to base station o.

Further, the task unloading time delay of the user i in the step 2) and the step 3) is shown in formula (10), and the task unloading time delay constraint is shown in formula (11):

in the formula (10), because the size of the task calculation result is much smaller than that of the task, only the uplink transmission delay of the user task unloading and the calculation delay of the task are considered, the downlink transmission delay of the task calculation result is ignored,

representing the total time delay for task offloading for k-slot user i,

the transmission delay of the unloading task of the k time slot user i is represented as shown in formula (12);

indicating uninstallation of the taskThe calculated delay of a transaction is shown in equation (13):

in the formula (12), the first and second groups,

the uplink transmission rates from k slot user i to drone j and base station o are respectively represented by equations (14) and (15):

in the formulas (14) and (15), W is the user channel bandwidth, p_iFor user transmit power, σ²In order to be able to measure the power of the noise,

and

representing the communication channel gains for k-slot user i to drone j and base station o, respectively.

Further, in the step 104, a k-slot three-dimensional action vector a of the unmanned aerial vehicle is obtained through the unmanned aerial vehicle action strategy network pi_kSpace-ground network state vector s of k time slots_kAnd motion vector a_kCalculating to obtain s_k+1The method specifically comprises the following steps:

space-to-ground network state vector of k time slots

Inputting an unmanned aerial vehicle action strategy network pi, and obtaining a three-dimensional action vector of an unmanned aerial vehicle j through forward propagation of neurons in each layer in the pi network

Is obtained by calculation

Wherein the content of the first and second substances,

and L is the horizontal moving distance of the k time slot unmanned plane j.

Further, in the step 105, a Mini-Batch sample data set is obtained from the empirical sample storage area E in a random sampling manner, and the method for optimizing the state-action value network and the action policy network includes:

to solve the state-action value network Q, Q includes Q(s)_k,a_k|θ₁ ^Q) And

action policy network pi(s)_k|θ^π) The network is unstable in the learning process, and Q(s) is defined_k,a_k|θ₁ ^Q) Is Q'(s)_k,a_k|θ₁ ^Q′)，

The target network of

π(s_k|θ^π) Is pi'(s)_k|θ^π′)。

Updating a state-action value network by a gradient descent method

Parameter (d) of

As shown in equation (16):

wherein

Is composed of

The learning rate of (a) is determined,

to represent

Network structure parameters, loss functions

As shown in equation (17):

wherein, a'_k+1＝a_k+1+ ε, ε -clip (N (0, σ), - κ, κ), clip (·) denotes the clipping function, N denotes the mean 0, the variance σ Gaussian noise, κ denotes the clipping parameter, γ denotes the discounting factor, X denotes the sample set randomly sampled from E, X ═ { X ·_k}，x_k＝[s_k,a_k,r_k,s_k+1]。

Action policy network pi(s)_k|θ^π) Network parameter θ of^πThe update is shown in equation (18):

wherein mu^πIs pi(s)_k|θ^π) Learning rate of theta^πDenotes pi(s)_k|θ^π) Network structure parameter,. pi(s)_k|θ^π) Of (2) a gradient of the strategy

As shown in equation (19):

target network

And pi'(s)_k|θ^π′) Network parameter in

And theta^π′The updating is shown in formulas (20) and (21), wherein the factor is updated

The invention has the following advantages and beneficial effects:

the invention discloses a dynamic optimization method for computing resources of an edge network based on unmanned aerial vehicle auxiliary nodes. The existing problem of task unloading of the edge network based on unmanned aerial vehicle assistance mostly focuses on reducing task unloading time delay of ground users through optimized deployment of unmanned aerial vehicle resources, but neglects the situation that local area user traffic is sudden possibly in an actual scene. The invention provides a dynamic optimization method of computing resources based on unmanned aerial vehicle auxiliary node adaptive cruise, aiming at the problems of shortage of edge server resources and deterioration of task unloading quality caused by burst of local traffic in a cell. According to the position distribution and task unloading requirements of ground users, a deep reinforcement learning method is adopted to dynamically plan the cruising track of the unmanned aerial vehicle, and the utilization rate of server resources of unmanned aerial vehicle nodes and base station nodes in the cruising process is maximized through a task unloading scheduling strategy, so that the task interruption rate of local users is effectively reduced, and the average task unloading delay is reduced.

Drawings

Fig. 1 is a flowchart of a method for dynamically optimizing edge network computing resources based on an auxiliary node of an unmanned aerial vehicle according to a preferred embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

the concepts and models involved in the present disclosure are as follows.

1. And (3) system model:

assuming that users in an edge network cell are randomly distributed, an edge server can provide task unloading service for the users in the cell through a cell base station. An unmanned aerial vehicle auxiliary edge node is configured in the cell, and task unloading service can be provided for users in the effective coverage range of the unmanned aerial vehicle auxiliary edge node. When local user traffic in a cell is sudden, the unmanned aerial vehicle node can optimize the distribution state of computing resources and task unloading scheduling through self-adaptive cruise, the task interruption rate of local users is reduced, and the average task unloading delay is reduced.

2. Other symbols relating to the present invention are described below:

s_k: state vector

a_k: motion vector

r_k: reward function

π(s_k|θ^π): unmanned aerial vehicle action policy network

Q(s_k,a_k|θ₁ ^Q)、

Unmanned aerial vehicle state-action value network

θ: neural network parameters

Δ t: time slot size

Average unit task time delay of user i in k time slot

Whether user i meets the average unit task tolerance time delay in k time slot

User i offloads to unmanned aerial vehicle j task volume in k time slot

User i offloads to base station o task volume in k time slot

Indicating that unmanned plane j allocates computing resources for user i in k time slot

Indicating the computing resources allocated by base station o to user i in k time slots

Horizontal moving direction of unmanned aerial vehicle in k time slot

Vertical movement distance of unmanned aerial vehicle in k time slot

User i is in k time slot and unmanned aerial vehicle j's connected state

Connection state of user i with base station o in k time slot

Transmission delay of user i offload task

D^comp: computing latency for user i offloading tasks

C_j: total amount of computing resources for drone j

C_o: total amount of computing resources of base station o

W: user channel bandwidth

p_i: user i transmit power

σ²: noise power

Representing communication channel gain from k slot user i to drone j

Representing communication channel gain for k-slot users i to base station o

F: unit task size

User i task complexity

The technical scheme of the invention is explained as follows:

1. task offload latency and constraints thereof

Task offload latency for user i

As shown in equation (1), the task latency constraint is shown in equation (2):

in the formula (1)

The transmission delay of the unloading task of the k-slot user i is shown, as shown in formula (3),

the calculation time delay of the user i for unloading the task is shown as formula (4):

in the formula (3), F represents the unit task size,

respectively representing the uplink transmission rates from k time slot user i to unmanned aerial vehicle j and base station o, in formula (4)

Indicating the task complexity of the user i,

indicating the computing resources allocated by drone j for user i in k time slots,

indicating the computational resources allocated by base station o for user i in k time slots.

As shown in formulas (5) and (6):

in equations (5) and (6), W is the user channel bandwidth, p_iFor user transmit power, σ²In order to be able to measure the power of the noise,

and

representing the communication channel gains for k slot user i to drone j and base station o, respectively.

2. State vector, action vector and reward function of Markov decision model

The ground-air network state vector, the unmanned aerial vehicle three-dimensional motion vector and the unmanned aerial vehicle motion reward function are respectively shown in formulas (7), (8) and (9):

in the formula (7), the first and second groups,

representing the two-dimensional coordinate position of the user i in the k time slot; in the formula (8), the first and second groups,

indicating the horizontal direction of motion of drone j in k slots,

indicating the vertical movement distance of drone j in k time slot. In the formula (9), ω represents the weighting factor of the unmanned aerial vehicle action reward function, ω belongs to (0,1),

represents the average unit task delay of k time slot user i, as shown in equation (10),

the average unit task time delay of the user i of the k time slot meets the average unit task tolerance time delay, otherwise, the average unit task tolerance time delay is

As shown in formula (11):

in the formula (10), the first and second groups,

representing user i and dronej, if the user i unloads the task to the unmanned plane j for execution in the k time slot, then

Otherwise

Otherwise

The delta-t represents the size of the time slot,

indicating the amount of tasks user i offloads to drone j in k slots,

3. Deep reinforcement learning model constructed based on double-delay deep certainty strategy gradient algorithm idea

According to the Markov decision process, the cruising time of the unmanned aerial vehicle is divided into a plurality of time slots with the same size, and in any time slot K (epsilon) K, the relative position relation and the connection state of the unmanned aerial vehicle and a ground user are unchanged.

An unmanned aerial vehicle controller deployed in a base station control center is used as an intelligent agent, and the deep construction is based on the idea of a double-delay deep deterministic strategy gradient algorithmThe method comprises the following steps of Learning a model, wherein the idea of a double-delay depth deterministic strategy gradient algorithm is derived from the documents Fujimoto S, Hoof H V, Meger D.addressing Function application Error in Actor-critical methods.35th International Conference on Machine Learning, ICML 2018 and July 10,2018-July 15,2018. The deep reinforcement learning model comprises a system environment collector and an unmanned aerial vehicle action strategy network pi(s) based on a neural network_k|θ^π) And unmanned aerial vehicle state-action value network Q(s)_k,a_k|θ^Q) The system comprises a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E and a random sample set Mini-Batch.

And the remaining available computing resources of the drone; unmanned aerial vehicle action strategy network pi(s)_k|θ^π) For generating a ground-to-air network state vector s at k time slots_kThree-dimensional motion vector a of lower unmanned aerial vehicle_k。π(s_k|θ^π) Two hidden layers can be adopted, each layer is a neural network with 256 neurons, and the neuron activation function can adopt a Relu function; unmanned aerial vehicle state-action value network Q(s)_k,a_k|θ^Q) For generating a ground-to-air network state vector s at k time slots_kThree-dimensional motion vector a of lower execution unmanned aerial vehicle_kThe operation evaluation value Q of (2) may be Q(s) having the same structure_k,a_k|θ₁ ^Q) And

the neural network can adopt three layers of hidden layers, 256 neurons are respectively configured, and the neuron activation function can adopt a Relu function.

Task scheduling policy generator for generating k time slotsUser unloading strategy, and respectively obtaining user object set I of unmanned aerial vehicle j_jAnd user object set I of base station o_oTask offload decision variable set

And

the unmanned aerial vehicle action reward generator generates an action reward value r after the unmanned aerial vehicle j finishes the unloading task of the k time slot_k(ii) a Unmanned aerial vehicle carries out a_kThe ground-to-air network state after the action is composed of_kIs transferred to s_k+1(ii) a K time slot experience sample [ s ] is added in the experience sample storage area E_k,a_k,r_k,s_k+1](ii) a The random sample set Mini-Batch is formed by randomly extracting a fixed number of samples from an experience sample storage area E; leading the Mini-Batch sample set into an action strategy network pi(s) respectively_k|θ^π) State-action value network Q(s)_k,a_k|θ₁ ^Q) And

training to update neural network parameters θ^π、θ₁ ^Q、

4. By the ground-to-air network state vector s_kAnd motion vector a_kCalculating to obtain s_k+1Method (2)

Space-to-ground network state vector of k time slots

Is obtained by calculation

Wherein the content of the first and second substances,

and L is the horizontal moving distance of the k time slot unmanned plane j.

5. Task volume calculation for user offloading to unmanned aerial vehicle

The task amount calculation method for unloading the user to the unmanned aerial vehicle is shown in formulas (12) and (13):

wherein the content of the first and second substances,

indicating the computing resources allocated by k-slot drone j to user i, C_jRepresenting the total amount of computing resources for drone j.

6. Workload calculation for user offloading to base station

The calculation method of the task load unloaded from the user to the base station is shown in the formulas (14) and (15):

wherein the content of the first and second substances,

indicating the computing resources allocated by the k-slot base station o to user i, C_oRepresenting the total amount of computational resources of base station o.

7. User unloading task scheduling method

1) User in effective coverage area of unmanned plane jI join unmanned j service object set I_jLet us order

Base station o service object set I_o＝I-I_jLet us order

Are respectively to I_jAnd I_oUser i according to

Arranging in descending order;

2) according to the formulas (12) and (13), I is calculated_jWorkload offloaded by user i to drone j

3) Calculating I according to the formulas (14) and (15)_oWorkload for user i offloading to base station o

8. State-action value network, action strategy network updating method

To solve the state-action value network Q(s)_k,a_k|θ₁ ^Q) And

action policy network pi(s)_k|θ^π) The unstable problem of the network in the learning process defines Q(s)_k,a_k|θ₁ ^Q) Is Q'(s)_k,a_k|θ₁ ^Q′)，

The target network of

π(s_k|θ^π) Is pi'(s)_k|θ^π′)。

Updating a state-action value network by a gradient descent method

Parameter (d) of

As shown in equation (16):

wherein

Is composed of

Learning rate, loss function

As shown in equation (17):

Action policy network pi(s)_k|θ^π) Parameter theta^πThe update is shown in equation (18):

wherein mu^πIs pi(s)_k|θ^π) Learning rate of (n), n(s)_k|θ^π) Of (2) a gradient of the strategy

As shown in equation (19):

target network

And pi'(s)_k|θ^π′) Network parameter θ in^π′As shown in equations (20), (21), wherein the factors are updated

A dynamic optimization method for edge network computing resources based on unmanned aerial vehicle auxiliary nodes is specifically implemented by the following steps:

step 1: constructing a discrete time-state model according to a Markov decision process, comprising the steps of dispersing the cruising time of the unmanned aerial vehicle into time slots, and setting a time slot variable k and a ground-air network state vector s_kUnmanned aerial vehicle three-dimensional motion vector a_kUnmanned aerial vehicle action reward function r_kWherein s is_k，a_k，r_kCorresponding transition and change are carried out along with the increase of the time slot number k, and the initialized time slot variable k is 0;

step 2: will have no effect onThe man-machine controller is used as an intelligent agent, a depth reinforcement learning model is constructed according to the thought of the double-delay depth certainty strategy gradient algorithm, and an unmanned aerial vehicle action strategy network pi(s) is established_k|θ^π) Unmanned aerial vehicle state-action value network Q(s)_k,a_k|θ₁ ^Q) And

and step 3: and (4) making k equal to k +1, if the three-dimensional position of the unmanned aerial vehicle does not change in continuous n time slots, jumping to step 6, otherwise, determining a user object set I of the unmanned aerial vehicle j according to the effective coverage range of the unmanned aerial vehicle j_jUser object set I of base station o_o＝I-I_jWherein, I represents the whole user object set, and is obtained by the task scheduling strategy generator_jAnd I_oTask offload decision variable set

And

jumping to the step 4;

and 4, step 4: according to

And

performing task offloading of user i, obtaining corresponding reward value r by unmanned aerial vehicle action reward generator_kK time slot unmanned aerial vehicle three-dimensional motion vector a is obtained through unmanned aerial vehicle motion strategy network pi_kSpace-ground network state vector s of k time slots_kAnd motion vector a_kCalculating to obtain s_k+1Will [ s ]_k,a_k,r_k,s_k+1]Storing the experience sample into an experience sample storage area E;

and 5: randomly sampling from the storage area E to obtain a Mini-Batch sample set, and respectively importing the Mini-Batch sample set into an action strategy network pi(s)_k|θ^π) And state-action value network Q(s)_k,a_k|θ₁ ^Q) And

training is carried out, and the step 3 is skipped;

step 6: the algorithm ends.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A dynamic optimization method for computing resources of an edge network based on an unmanned aerial vehicle auxiliary node is characterized by comprising the following steps:

And

jumping to step 104;

104. according to

And

105. randomly sampling from an empirical sample storage area E to obtain a Mini-Batch sample set, respectively importing the Mini-Batch sample set into an action strategy network pi and a state-action value network Q for training, and jumping to a step 103;

106. the algorithm ends.

2. The method of claim 1, wherein the discrete time-state model is constructed in step 101 according to a Markov decision process, wherein the k-slot ground-to-air network state vector s is_kUnmanned aerial vehicle three-dimensional motion vector a_kUnmanned aerial vehicle action reward function r_kAs shown in formulas (1), (2) and (3):

in the formula (1), the first and second groups,

indicating the horizontal direction of motion of drone j in k slots,

As shown in equation (5):

in the formula (4), the first and second groups,

Otherwise

Otherwise

User i can only be connected with one unmanned aerial vehicle or base station at most in k time slotI.e. by

Indicating the amount of tasks user i offloads to drone j in k slots,

3. The method according to claim 1, wherein the step 102 is a deep reinforcement learning model constructed based on a double-delay deep deterministic strategy gradient algorithm idea, and includes a system environment collector, an unmanned aerial vehicle action strategy network pi, an unmanned aerial vehicle state-action value network Q, a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E, and a random sample set Mini-Batch, and specifically includes:

And

4. The method of claim 1, wherein the task scheduling policy generator decides a set of task offload variables for the user in step 103

And

the method comprises the following steps:

Base station o service object set I_o＝I-I_jLet us order

Are respectively to I_jAnd I_oUser i according to

Arranging in descending order;

2) offloading latency according to user i task

Calculation of I_jWorkload offloaded by user i to drone j

3) Offloading latency according to user i task

Calculation of I_oWorkload for user i offloading to base station o

5. The method for dynamically optimizing edge network computing resources based on unmanned aerial vehicle auxiliary nodes as claimed in claim 4, wherein the task amount offloaded by the user i to the unmanned aerial vehicle j in the step 2)

The calculation method of (2) is shown in the formulas (6) and (7):

wherein the content of the first and second substances,

representing the task complexity of user i.

6. The method for dynamically optimizing edge network computing resources based on unmanned aerial vehicle auxiliary nodes as claimed in claim 5, wherein the task amount offloaded from the user i to the base station o in the step 3)

The calculation method of (2) is shown in the formulas (8) and (9):

wherein the content of the first and second substances,

indicating the uplink transmission rate of k-slot user i to base station o.

7. The method for dynamically optimizing edge network computing resources based on unmanned aerial vehicle auxiliary nodes according to claim 4, wherein the task offload delay of the user i in the step 2) and the step 3) is shown in formula (10), and the task offload delay constraint is shown in formula (11):

representing the total time delay for task offloading for k-slot user i,

represents the calculation time delay of the unloading task, as shown in formula (13):

in the formula (12), the first and second groups,

and

8. The dynamic optimization method for edge network computing resources based on unmanned aerial vehicle auxiliary nodes as claimed in claim 1, wherein k-slot unmanned aerial vehicle three-dimensional motion vector a is obtained through unmanned aerial vehicle motion strategy network pi in step 104_kSpace-ground network state vector s of k time slots_kAnd motion vector a_kCalculating to obtain s_k+1The method specifically comprises the following steps:

space-earth network state vector for k time slots

Is obtained by calculation

Wherein the content of the first and second substances,

and L is the horizontal moving distance of the k time slot unmanned plane j.

9. The method according to claim 1, wherein in step 105, a Mini-Batch sample data set is obtained from an experience sample storage area E by a random sampling method, and the method for optimizing the state-action value network and the action policy network comprises:

to solve the state-action value network Q, Q comprises