CN114513814A - Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node - Google Patents
Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node Download PDFInfo
- Publication number
- CN114513814A CN114513814A CN202210079544.6A CN202210079544A CN114513814A CN 114513814 A CN114513814 A CN 114513814A CN 202210079544 A CN202210079544 A CN 202210079544A CN 114513814 A CN114513814 A CN 114513814A
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- user
- network
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/08—Load balancing or load distribution
- H04W28/09—Management thereof
- H04W28/0925—Management thereof using policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/0226—Traffic management, e.g. flow control or congestion control based on location or mobility
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/08—Load balancing or load distribution
- H04W28/09—Management thereof
- H04W28/0958—Management thereof based on metrics or performance parameters
- H04W28/0967—Quality of Service [QoS] parameters
- H04W28/0975—Quality of Service [QoS] parameters for reducing delays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a dynamic optimization method for computing resources of an edge network based on unmanned aerial vehicle auxiliary nodes, and belongs to the technical field of communication. Aiming at the problems that server computing resources are insufficient and task unloading quality is deteriorated due to sudden local user traffic in an edge network cell, a computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node self-adaptive cruise is provided. According to the position distribution and task unloading requirements of ground users, a deep reinforcement learning method is adopted to dynamically plan the cruising track of the unmanned aerial vehicle, and the utilization rate of server resources of unmanned aerial vehicle nodes and base station nodes in the cruising process is maximized through a task unloading scheduling strategy, so that the task interruption rate of local users is effectively reduced, and the average task unloading delay is reduced.
Description
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a dynamic optimization method for computing resources of an edge network based on an unmanned aerial vehicle auxiliary node.
Background
With the popularization and development of mobile networks, novel applications such as augmented reality, virtual reality and automatic driving are emerging continuously, and the daily life of people is greatly enriched. However, these applications are generally time-delay-demanding and also consume a large amount of computing resources, and it is difficult for a mobile terminal to achieve fast and efficient processing for such applications. The mobile edge computing can provide computing resources required by task unloading for the user in a short distance by sinking the cloud resources to the edge network, and effectively shortens the transmission delay between the user and the cloud server.
However, rapid changes in the distribution of terrestrial users and random traffic bursts of local area users may cause huge pressure on the fixed server resources of the edge network, resulting in situations of low utilization of computing resources and deteriorated user service experience. Therefore, the low-altitude unmanned aerial vehicle is used as an auxiliary node of the edge computing network, flexible resource supplement is provided for ground nodes, and the method becomes an important mode for future network construction and development.
The invention provides a dynamic computing resource optimization method based on unmanned aerial vehicle auxiliary node adaptive cruise, aiming at the problems of server computing resource shortage and task unloading quality deterioration caused by local user traffic burst in an edge network cell, so that the task interruption rate of local users is effectively reduced, and the average task unloading delay is reduced.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. An edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary nodes is provided. The technical scheme of the invention is as follows:
an edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary nodes comprises the following steps:
101. constructing a discrete time-state model according to a Markov decision process, comprising the steps of dispersing the cruising time of the unmanned aerial vehicle into time slots, and setting a time slot variable k and a ground-air network state vector skUnmanned aerial vehicle three-dimensional motion vector akUnmanned aerial vehicle action reward function rkWherein s isk,ak,rkCorresponding transition and change are carried out along with the increase of the time slot number k, and the initialized time slot variable k is 0;
102. the method comprises the steps that an unmanned aerial vehicle controller is used as an intelligent agent, a depth reinforcement learning model is constructed on the basis of a double-delay depth certainty strategy gradient algorithm idea, and the method comprises the steps of establishing a system environment collector, an unmanned aerial vehicle action strategy network pi, an unmanned aerial vehicle state-action value network Q, a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E and a random sample set Mini-Batch;
103. and if the three-dimensional coordinate position of the unmanned aerial vehicle does not change in the continuous n time slots, jumping to step 106, otherwise, determining a user object set I of the unmanned aerial vehicle j according to the effective coverage range of the unmanned aerial vehicle jjUser object set I of base station oo=I-IjWherein I represents the whole user object set, and is obtained through a task scheduling strategy generatorjAnd IoTask offload decision variable setAndjumping to step 104;
104. according toAndexecuting task unloading request of user i, and obtaining corresponding reward value r through unmanned aerial vehicle action reward generatorkAcquiring a k time slot unmanned aerial vehicle three-dimensional motion vector a through an unmanned aerial vehicle motion strategy network pikSpace-ground network state vector s of k time slotskAnd motion vector akCalculating to obtain sk+1Will [ s ]k,ak,rk,sk+1]Storing the experience sample into an experience sample storage area E;
105. randomly sampling from an experience sample storage area E to obtain a Mini-Batch sample set, respectively importing the Mini-Batch sample set into an action strategy network pi and a state-action value network Q for training, and jumping to step 103;
106. the algorithm ends.
Go toStep(s) of constructing a discrete time-state model according to a Markov decision process in step (101), wherein a ground-to-air network state vector s of k time slotskUnmanned aerial vehicle three-dimensional motion vector akUnmanned aerial vehicle action reward function rkAs shown in formulas (1), (2) and (3):
in the formula (1), the first and second groups,representing the three-dimensional coordinate position of drone j in k time slot,representing the two-dimensional coordinate position of the user i in the k time slot; in the formula (2), the first and second groups,indicating the horizontal direction of motion of drone j in k slots,representing the vertical movement distance of the unmanned plane j in the k time slot; in the formula (3), ω represents the weighting factor of the unmanned aerial vehicle action reward function, ω ∈ (0,1), Δ t represents the time slot size,represents the average unit task delay of k time slot user i, as shown in equation (4),the average unit task time delay of the user i representing the k time slot meets the average unit task tolerance time delay tauiOtherwise isAs shown in equation (5):
in the formula (4), the first and second groups,representing the connection state of the user i and the unmanned plane j, and if the user i unloads the task to the unmanned plane j for execution in the k time slotOtherwise Representing the connection state of the user i and the base station o, if the user i unloads the task to the base station o for execution in the k time slotOtherwiseUser i can only be connected with one unmanned aerial vehicle or base station at most in k time slot, namely Indicating the amount of tasks user i offloads to drone j in k slots,indicating the amount of tasks, τ, that user i offloads to base station o in k slotsiAnd indicating the average unit task tolerance time delay of the user i.
Further, the step 102 is to construct a deep reinforcement learning model based on the idea of a double-delay deep deterministic strategy gradient algorithm, and includes establishing a system environment collector, an unmanned aerial vehicle action strategy network pi, an unmanned aerial vehicle state-action value network Q, a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E, and a random sample set Mini-Batch, which specifically includes:
the system environment collector is used for collecting two-dimensional coordinate positions of ground users in a k time slot ground-air networkUser task unloading request and three-dimensional coordinate position of unmanned aerial vehicleAnd the remaining available computing resources of the drone; unmanned aerial vehicle action strategy network pi generation k time slot ground-air network state skThree-dimensional motion vector a of lower unmanned aerial vehiclek(ii) a Generation of k-slot ground-air network state s by unmanned aerial vehicle state-action value network QkLower execution unmanned aerial vehicle three-dimensional motion vector akThe action evaluation value q of (1); the task scheduling strategy generator is used for generating a k time slot user unloading strategy and obtaining a task unloading decision variable setAndthe unmanned aerial vehicle action reward generator generates an action reward value r of an unmanned aerial vehicle j in a k time slot after finishing an unloading task in the k time slotk(ii) a Unmanned aerial vehicle carries out akThe ground-to-air network state after the action is composed ofkIs transferred to sk+1(ii) a K time slot experience sample [ s ] is added in the experience sample storage area Ek,ak,rk,sk+1](ii) a The random sample set Mini-Batch is composed of an empirical sample storage area E which randomly extracts a fixed number of samples. The unmanned aerial vehicle action strategy network pi and the unmanned aerial vehicle state-action value network Q are both neural networks and respectively comprise a plurality of hidden layers, and each hidden layer comprises a plurality of neurons.
Further, the task scheduling policy generator in step 103 decides a set of task offload variables of the userAndthe method comprises the following steps:
1) adding user I in effective coverage area of unmanned aerial vehicle j into unmanned aerial vehicle j service object set IjLet us order Base station o service object set Io=I-IjLet us orderAre respectively to IjAnd IoUser i according toArranging in descending order;
2) offloading latency according to user i taskCalculation of IjWorkload offloaded by user i to drone j
3) Offloading latency according to user i taskCalculation of IoWorkload for user i offloading to base station o
Further, the task amount unloaded by the user i to the unmanned aerial vehicle j in the step 2)The calculation method of (2) is shown in the formulas (6) and (7):
wherein the content of the first and second substances,denotes the computing resources allocated by k-slot drone j for user i, CjRepresenting the total amount of computing resources for drone j,represents the uplink transmission rate from the k-slot user i to the unmanned plane j, F represents the size of the task unit,representing the task complexity of user i.
Further, the task amount of the user i to be offloaded to the base station o in the step 3)The calculation method of (2) is shown in the formulas (8) and (9):
wherein the content of the first and second substances,indicating the computing resources allocated by the k-slot base station o to user i, CoRepresenting the total amount of computational resources of the base station o,indicating the uplink transmission rate of k-slot user i to base station o.
Further, the task unloading time delay of the user i in the step 2) and the step 3) is shown in formula (10), and the task unloading time delay constraint is shown in formula (11):
in the formula (10), because the size of the task calculation result is much smaller than that of the task, only the uplink transmission delay of the user task unloading and the calculation delay of the task are considered, the downlink transmission delay of the task calculation result is ignored,representing the total time delay for task offloading for k-slot user i,the transmission delay of the unloading task of the k time slot user i is represented as shown in formula (12);indicating uninstallation of the taskThe calculated delay of a transaction is shown in equation (13):
in the formula (12), the first and second groups,the uplink transmission rates from k slot user i to drone j and base station o are respectively represented by equations (14) and (15):
in the formulas (14) and (15), W is the user channel bandwidth, piFor user transmit power, σ2In order to be able to measure the power of the noise,andrepresenting the communication channel gains for k-slot user i to drone j and base station o, respectively.
Further, in the step 104, a k-slot three-dimensional action vector a of the unmanned aerial vehicle is obtained through the unmanned aerial vehicle action strategy network pikSpace-ground network state vector s of k time slotskAnd motion vector akCalculating to obtain sk+1The method specifically comprises the following steps:
space-to-ground network state vector of k time slotsInputting an unmanned aerial vehicle action strategy network pi, and obtaining a three-dimensional action vector of an unmanned aerial vehicle j through forward propagation of neurons in each layer in the pi networkIs obtained by calculationWherein the content of the first and second substances,and L is the horizontal moving distance of the k time slot unmanned plane j.
Further, in the step 105, a Mini-Batch sample data set is obtained from the empirical sample storage area E in a random sampling manner, and the method for optimizing the state-action value network and the action policy network includes:
to solve the state-action value network Q, Q includes Q(s)k,ak|θ1 Q) Andaction policy network pi(s)k|θπ) The network is unstable in the learning process, and Q(s) is definedk,ak|θ1 Q) Is Q'(s)k,ak|θ1 Q′),The target network ofπ(sk|θπ) Is pi'(s)k|θπ′)。
Updating a state-action value network by a gradient descent methodParameter (d) ofAs shown in equation (16):
whereinIs composed ofThe learning rate of (a) is determined,to representNetwork structure parameters, loss functionsAs shown in equation (17):
wherein, a'k+1=ak+1+ ε, ε -clip (N (0, σ), - κ, κ), clip (·) denotes the clipping function, N denotes the mean 0, the variance σ Gaussian noise, κ denotes the clipping parameter, γ denotes the discounting factor, X denotes the sample set randomly sampled from E, X ═ { X ·k},xk=[sk,ak,rk,sk+1]。
Action policy network pi(s)k|θπ) Network parameter θ ofπThe update is shown in equation (18):
wherein muπIs pi(s)k|θπ) Learning rate of thetaπDenotes pi(s)k|θπ) Network structure parameter,. pi(s)k|θπ) Of (2) a gradient of the strategyAs shown in equation (19):
target networkAnd pi'(s)k|θπ′) Network parameter inAnd thetaπ′The updating is shown in formulas (20) and (21), wherein the factor is updated
The invention has the following advantages and beneficial effects:
the invention discloses a dynamic optimization method for computing resources of an edge network based on unmanned aerial vehicle auxiliary nodes. The existing problem of task unloading of the edge network based on unmanned aerial vehicle assistance mostly focuses on reducing task unloading time delay of ground users through optimized deployment of unmanned aerial vehicle resources, but neglects the situation that local area user traffic is sudden possibly in an actual scene. The invention provides a dynamic optimization method of computing resources based on unmanned aerial vehicle auxiliary node adaptive cruise, aiming at the problems of shortage of edge server resources and deterioration of task unloading quality caused by burst of local traffic in a cell. According to the position distribution and task unloading requirements of ground users, a deep reinforcement learning method is adopted to dynamically plan the cruising track of the unmanned aerial vehicle, and the utilization rate of server resources of unmanned aerial vehicle nodes and base station nodes in the cruising process is maximized through a task unloading scheduling strategy, so that the task interruption rate of local users is effectively reduced, and the average task unloading delay is reduced.
Drawings
Fig. 1 is a flowchart of a method for dynamically optimizing edge network computing resources based on an auxiliary node of an unmanned aerial vehicle according to a preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
the concepts and models involved in the present disclosure are as follows.
1. And (3) system model:
assuming that users in an edge network cell are randomly distributed, an edge server can provide task unloading service for the users in the cell through a cell base station. An unmanned aerial vehicle auxiliary edge node is configured in the cell, and task unloading service can be provided for users in the effective coverage range of the unmanned aerial vehicle auxiliary edge node. When local user traffic in a cell is sudden, the unmanned aerial vehicle node can optimize the distribution state of computing resources and task unloading scheduling through self-adaptive cruise, the task interruption rate of local users is reduced, and the average task unloading delay is reduced.
2. Other symbols relating to the present invention are described below:
sk: state vector
ak: motion vector
rk: reward function
π(sk|θπ): unmanned aerial vehicle action policy network
θ: neural network parameters
Δ t: time slot size
Dcomp: computing latency for user i offloading tasks
Cj: total amount of computing resources for drone j
Co: total amount of computing resources of base station o
W: user channel bandwidth
pi: user i transmit power
σ2: noise power
F: unit task size
The technical scheme of the invention is explained as follows:
1. task offload latency and constraints thereof
Task offload latency for user iAs shown in equation (1), the task latency constraint is shown in equation (2):
in the formula (1)The transmission delay of the unloading task of the k-slot user i is shown, as shown in formula (3),the calculation time delay of the user i for unloading the task is shown as formula (4):
in the formula (3), F represents the unit task size,respectively representing the uplink transmission rates from k time slot user i to unmanned aerial vehicle j and base station o, in formula (4)Indicating the task complexity of the user i,indicating the computing resources allocated by drone j for user i in k time slots,indicating the computational resources allocated by base station o for user i in k time slots.As shown in formulas (5) and (6):
in equations (5) and (6), W is the user channel bandwidth, piFor user transmit power, σ2In order to be able to measure the power of the noise,andrepresenting the communication channel gains for k slot user i to drone j and base station o, respectively.
2. State vector, action vector and reward function of Markov decision model
The ground-air network state vector, the unmanned aerial vehicle three-dimensional motion vector and the unmanned aerial vehicle motion reward function are respectively shown in formulas (7), (8) and (9):
in the formula (7), the first and second groups,representing the three-dimensional coordinate position of drone j in k time slot,representing the two-dimensional coordinate position of the user i in the k time slot; in the formula (8), the first and second groups,indicating the horizontal direction of motion of drone j in k slots,indicating the vertical movement distance of drone j in k time slot. In the formula (9), ω represents the weighting factor of the unmanned aerial vehicle action reward function, ω belongs to (0,1),represents the average unit task delay of k time slot user i, as shown in equation (10),the average unit task time delay of the user i of the k time slot meets the average unit task tolerance time delay, otherwise, the average unit task tolerance time delay isAs shown in formula (11):
in the formula (10), the first and second groups,representing user i and dronej, if the user i unloads the task to the unmanned plane j for execution in the k time slot, thenOtherwise Representing the connection state of the user i and the base station o, if the user i unloads the task to the base station o for execution in the k time slotOtherwiseUser i can only be connected with one unmanned aerial vehicle or base station at most in k time slot, namelyThe delta-t represents the size of the time slot,indicating the amount of tasks user i offloads to drone j in k slots,indicating the amount of tasks, τ, that user i offloads to base station o in k slotsiAnd indicating the average unit task tolerance time delay of the user i.
3. Deep reinforcement learning model constructed based on double-delay deep certainty strategy gradient algorithm idea
According to the Markov decision process, the cruising time of the unmanned aerial vehicle is divided into a plurality of time slots with the same size, and in any time slot K (epsilon) K, the relative position relation and the connection state of the unmanned aerial vehicle and a ground user are unchanged.
An unmanned aerial vehicle controller deployed in a base station control center is used as an intelligent agent, and the deep construction is based on the idea of a double-delay deep deterministic strategy gradient algorithmThe method comprises the following steps of Learning a model, wherein the idea of a double-delay depth deterministic strategy gradient algorithm is derived from the documents Fujimoto S, Hoof H V, Meger D.addressing Function application Error in Actor-critical methods.35th International Conference on Machine Learning, ICML 2018 and July 10,2018-July 15,2018. The deep reinforcement learning model comprises a system environment collector and an unmanned aerial vehicle action strategy network pi(s) based on a neural networkk|θπ) And unmanned aerial vehicle state-action value network Q(s)k,ak|θQ) The system comprises a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E and a random sample set Mini-Batch.
The system environment collector is used for collecting two-dimensional coordinate positions of ground users in a k time slot ground-air networkUser task unloading request and three-dimensional coordinate position of unmanned aerial vehicleAnd the remaining available computing resources of the drone; unmanned aerial vehicle action strategy network pi(s)k|θπ) For generating a ground-to-air network state vector s at k time slotskThree-dimensional motion vector a of lower unmanned aerial vehiclek。π(sk|θπ) Two hidden layers can be adopted, each layer is a neural network with 256 neurons, and the neuron activation function can adopt a Relu function; unmanned aerial vehicle state-action value network Q(s)k,ak|θQ) For generating a ground-to-air network state vector s at k time slotskThree-dimensional motion vector a of lower execution unmanned aerial vehiclekThe operation evaluation value Q of (2) may be Q(s) having the same structurek,ak|θ1 Q) Andthe neural network can adopt three layers of hidden layers, 256 neurons are respectively configured, and the neuron activation function can adopt a Relu function.
Task scheduling policy generator for generating k time slotsUser unloading strategy, and respectively obtaining user object set I of unmanned aerial vehicle jjAnd user object set I of base station ooTask offload decision variable setAndthe unmanned aerial vehicle action reward generator generates an action reward value r after the unmanned aerial vehicle j finishes the unloading task of the k time slotk(ii) a Unmanned aerial vehicle carries out akThe ground-to-air network state after the action is composed ofkIs transferred to sk+1(ii) a K time slot experience sample [ s ] is added in the experience sample storage area Ek,ak,rk,sk+1](ii) a The random sample set Mini-Batch is formed by randomly extracting a fixed number of samples from an experience sample storage area E; leading the Mini-Batch sample set into an action strategy network pi(s) respectivelyk|θπ) State-action value network Q(s)k,ak|θ1 Q) Andtraining to update neural network parameters θπ、θ1 Q、4. By the ground-to-air network state vector skAnd motion vector akCalculating to obtain sk+1Method (2)
Space-to-ground network state vector of k time slotsInputting an unmanned aerial vehicle action strategy network pi, and obtaining a three-dimensional action vector of an unmanned aerial vehicle j through forward propagation of neurons in each layer in the pi networkIs obtained by calculationWherein the content of the first and second substances,and L is the horizontal moving distance of the k time slot unmanned plane j.
5. Task volume calculation for user offloading to unmanned aerial vehicle
The task amount calculation method for unloading the user to the unmanned aerial vehicle is shown in formulas (12) and (13):
wherein the content of the first and second substances,indicating the computing resources allocated by k-slot drone j to user i, CjRepresenting the total amount of computing resources for drone j.
6. Workload calculation for user offloading to base station
The calculation method of the task load unloaded from the user to the base station is shown in the formulas (14) and (15):
wherein the content of the first and second substances,indicating the computing resources allocated by the k-slot base station o to user i, CoRepresenting the total amount of computational resources of base station o.
7. User unloading task scheduling method
1) User in effective coverage area of unmanned plane jI join unmanned j service object set IjLet us order Base station o service object set Io=I-IjLet us orderAre respectively to IjAnd IoUser i according toArranging in descending order;
3) Calculating I according to the formulas (14) and (15)oWorkload for user i offloading to base station o
8. State-action value network, action strategy network updating method
To solve the state-action value network Q(s)k,ak|θ1 Q) Andaction policy network pi(s)k|θπ) The unstable problem of the network in the learning process defines Q(s)k,ak|θ1 Q) Is Q'(s)k,ak|θ1 Q′),The target network ofπ(sk|θπ) Is pi'(s)k|θπ′)。
Updating a state-action value network by a gradient descent methodParameter (d) ofAs shown in equation (16):
wherein, a'k+1=ak+1+ ε, ε -clip (N (0, σ), - κ, κ), clip (·) denotes the clipping function, N denotes the mean 0, the variance σ Gaussian noise, κ denotes the clipping parameter, γ denotes the discounting factor, X denotes the sample set randomly sampled from E, X ═ { X ·k},xk=[sk,ak,rk,sk+1]。
Action policy network pi(s)k|θπ) Parameter thetaπThe update is shown in equation (18):
wherein muπIs pi(s)k|θπ) Learning rate of (n), n(s)k|θπ) Of (2) a gradient of the strategyAs shown in equation (19):
target networkAnd pi'(s)k|θπ′) Network parameter θ inπ′As shown in equations (20), (21), wherein the factors are updated
A dynamic optimization method for edge network computing resources based on unmanned aerial vehicle auxiliary nodes is specifically implemented by the following steps:
step 1: constructing a discrete time-state model according to a Markov decision process, comprising the steps of dispersing the cruising time of the unmanned aerial vehicle into time slots, and setting a time slot variable k and a ground-air network state vector skUnmanned aerial vehicle three-dimensional motion vector akUnmanned aerial vehicle action reward function rkWherein s isk,ak,rkCorresponding transition and change are carried out along with the increase of the time slot number k, and the initialized time slot variable k is 0;
step 2: will have no effect onThe man-machine controller is used as an intelligent agent, a depth reinforcement learning model is constructed according to the thought of the double-delay depth certainty strategy gradient algorithm, and an unmanned aerial vehicle action strategy network pi(s) is establishedk|θπ) Unmanned aerial vehicle state-action value network Q(s)k,ak|θ1 Q) And
and step 3: and (4) making k equal to k +1, if the three-dimensional position of the unmanned aerial vehicle does not change in continuous n time slots, jumping to step 6, otherwise, determining a user object set I of the unmanned aerial vehicle j according to the effective coverage range of the unmanned aerial vehicle jjUser object set I of base station oo=I-IjWherein, I represents the whole user object set, and is obtained by the task scheduling strategy generatorjAnd IoTask offload decision variable setAndjumping to the step 4;
and 4, step 4: according toAndperforming task offloading of user i, obtaining corresponding reward value r by unmanned aerial vehicle action reward generatorkK time slot unmanned aerial vehicle three-dimensional motion vector a is obtained through unmanned aerial vehicle motion strategy network pikSpace-ground network state vector s of k time slotskAnd motion vector akCalculating to obtain sk+1Will [ s ]k,ak,rk,sk+1]Storing the experience sample into an experience sample storage area E;
and 5: randomly sampling from the storage area E to obtain a Mini-Batch sample set, and respectively importing the Mini-Batch sample set into an action strategy network pi(s)k|θπ) And state-action value network Q(s)k,ak|θ1 Q) Andtraining is carried out, and the step 3 is skipped;
step 6: the algorithm ends.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.
Claims (9)
1. A dynamic optimization method for computing resources of an edge network based on an unmanned aerial vehicle auxiliary node is characterized by comprising the following steps:
101. constructing a discrete time-state model according to a Markov decision process, comprising the steps of dispersing the cruising time of the unmanned aerial vehicle into time slots, and setting a time slot variable k and a ground-air network state vector skUnmanned aerial vehicle three-dimensional motion vector akUnmanned aerial vehicle action reward function rkWherein s isk,ak,rkCorresponding transition and change are carried out along with the increase of the time slot number k, and the initialized time slot variable k is 0;
102. the method comprises the steps that an unmanned aerial vehicle controller is used as an intelligent agent, a depth reinforcement learning model is constructed on the basis of a double-delay depth certainty strategy gradient algorithm idea, and the method comprises the steps of establishing a system environment collector, an unmanned aerial vehicle action strategy network pi, an unmanned aerial vehicle state-action value network Q, a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E and a random sample set Mini-Batch;
103. and if the three-dimensional coordinate position of the unmanned aerial vehicle does not change in the continuous n time slots, jumping to step 106, otherwise, determining a user object set I of the unmanned aerial vehicle j according to the effective coverage range of the unmanned aerial vehicle jjUser object set I of base station oo=I-IjWherein I represents the whole user object set, and is obtained through a task scheduling strategy generatorjAnd IoTask offload decision variable setAndjumping to step 104;
104. according toAndexecuting task unloading request of user i, and obtaining corresponding reward value r through unmanned aerial vehicle action reward generatorkAcquiring a k time slot unmanned aerial vehicle three-dimensional motion vector a through an unmanned aerial vehicle motion strategy network pikSpace-ground network state vector s of k time slotskAnd motion vector akCalculating to obtain sk+1Will [ s ]k,ak,rk,sk+1]Storing the experience sample into an experience sample storage area E;
105. randomly sampling from an empirical sample storage area E to obtain a Mini-Batch sample set, respectively importing the Mini-Batch sample set into an action strategy network pi and a state-action value network Q for training, and jumping to a step 103;
106. the algorithm ends.
2. The method of claim 1, wherein the discrete time-state model is constructed in step 101 according to a Markov decision process, wherein the k-slot ground-to-air network state vector s iskUnmanned aerial vehicle three-dimensional motion vector akUnmanned aerial vehicle action reward function rkAs shown in formulas (1), (2) and (3):
in the formula (1), the first and second groups,representing the three-dimensional coordinate position of drone j in k time slot,representing the two-dimensional coordinate position of the user i in the k time slot; in the formula (2), the first and second groups,indicating the horizontal direction of motion of drone j in k slots,representing the vertical movement distance of the unmanned plane j in the k time slot; in the formula (3), ω represents the weighting factor of the unmanned aerial vehicle action reward function, ω ∈ (0,1), Δ t represents the time slot size,represents the average unit task delay of k time slot user i, as shown in equation (4),the average unit task time delay of the user i representing the k time slot meets the average unit task tolerance time delay tauiOtherwise isAs shown in equation (5):
in the formula (4), the first and second groups,representing the connection state of the user i and the unmanned plane j, and if the user i unloads the task to the unmanned plane j for execution in the k time slotOtherwiseRepresenting the connection state of the user i and the base station o, if the user i unloads the task to the base station o for execution in the k time slotOtherwiseUser i can only be connected with one unmanned aerial vehicle or base station at most in k time slotI.e. by Indicating the amount of tasks user i offloads to drone j in k slots,indicating the amount of tasks, τ, that user i offloads to base station o in k slotsiAnd indicating the average unit task tolerance time delay of the user i.
3. The method according to claim 1, wherein the step 102 is a deep reinforcement learning model constructed based on a double-delay deep deterministic strategy gradient algorithm idea, and includes a system environment collector, an unmanned aerial vehicle action strategy network pi, an unmanned aerial vehicle state-action value network Q, a task scheduling strategy generator, an unmanned aerial vehicle action reward generator, an experience sample storage area E, and a random sample set Mini-Batch, and specifically includes:
the system environment collector is used for collecting two-dimensional coordinate positions of ground users in a k time slot ground-air networkUser task unloading request and three-dimensional coordinate position of unmanned aerial vehicleAnd the remaining available computing resources of the drone; unmanned aerial vehicle action strategy network pi generation k time slot ground-air network state skThree-dimensional motion vector a of lower unmanned aerial vehiclek(ii) a Generation of k-slot ground-air network state s by unmanned aerial vehicle state-action value network QkLower execution unmanned aerial vehicle three-dimensional motion vector akThe action evaluation value q of (1); the task scheduling strategy generator is used for generating a k time slot user unloading strategy and obtaining a task unloading decision variable setAndthe unmanned aerial vehicle action reward generator generates an action reward value r of an unmanned aerial vehicle j in a k time slot after finishing an unloading task in the k time slotk(ii) a Unmanned aerial vehicle carries out akThe ground-to-air network state after the action is composed ofkIs transferred to sk+1(ii) a K time slot experience sample [ s ] is added in the experience sample storage area Ek,ak,rk,sk+1](ii) a The random sample set Mini-Batch is composed of an empirical sample storage area E which randomly extracts a fixed number of samples. The unmanned aerial vehicle action strategy network pi and the unmanned aerial vehicle state-action value network Q are both neural networks and respectively comprise a plurality of hidden layers, and each hidden layer comprises a plurality of neurons.
4. The method of claim 1, wherein the task scheduling policy generator decides a set of task offload variables for the user in step 103Andthe method comprises the following steps:
1) adding user I in effective coverage area of unmanned aerial vehicle j into unmanned aerial vehicle j service object set IjLet us order Base station o service object set Io=I-IjLet us orderAre respectively to IjAnd IoUser i according toArranging in descending order;
2) offloading latency according to user i taskCalculation of IjWorkload offloaded by user i to drone j
5. The method for dynamically optimizing edge network computing resources based on unmanned aerial vehicle auxiliary nodes as claimed in claim 4, wherein the task amount offloaded by the user i to the unmanned aerial vehicle j in the step 2)The calculation method of (2) is shown in the formulas (6) and (7):
wherein the content of the first and second substances,denotes the computing resources allocated by k-slot drone j for user i, CjRepresenting the total amount of computing resources for drone j,represents the uplink transmission rate from the k-slot user i to the unmanned plane j, F represents the size of the task unit,representing the task complexity of user i.
6. The method for dynamically optimizing edge network computing resources based on unmanned aerial vehicle auxiliary nodes as claimed in claim 5, wherein the task amount offloaded from the user i to the base station o in the step 3)The calculation method of (2) is shown in the formulas (8) and (9):
7. The method for dynamically optimizing edge network computing resources based on unmanned aerial vehicle auxiliary nodes according to claim 4, wherein the task offload delay of the user i in the step 2) and the step 3) is shown in formula (10), and the task offload delay constraint is shown in formula (11):
in the formula (10), because the size of the task calculation result is much smaller than that of the task, only the uplink transmission delay of the user task unloading and the calculation delay of the task are considered, the downlink transmission delay of the task calculation result is ignored,representing the total time delay for task offloading for k-slot user i,the transmission delay of the unloading task of the k time slot user i is represented as shown in formula (12);represents the calculation time delay of the unloading task, as shown in formula (13):
in the formula (12), the first and second groups,the uplink transmission rates from k slot user i to drone j and base station o are respectively represented by equations (14) and (15):
8. The dynamic optimization method for edge network computing resources based on unmanned aerial vehicle auxiliary nodes as claimed in claim 1, wherein k-slot unmanned aerial vehicle three-dimensional motion vector a is obtained through unmanned aerial vehicle motion strategy network pi in step 104kSpace-ground network state vector s of k time slotskAnd motion vector akCalculating to obtain sk+1The method specifically comprises the following steps:
space-earth network state vector for k time slotsInputting an unmanned aerial vehicle action strategy network pi, and obtaining a three-dimensional action vector of an unmanned aerial vehicle j through forward propagation of neurons in each layer in the pi networkIs obtained by calculationWherein the content of the first and second substances,and L is the horizontal moving distance of the k time slot unmanned plane j.
9. The method according to claim 1, wherein in step 105, a Mini-Batch sample data set is obtained from an experience sample storage area E by a random sampling method, and the method for optimizing the state-action value network and the action policy network comprises:
to solve the state-action value network Q, Q comprisesAndand action strategy network pi(s)k|θπ) The network is defined in the unstable problem of learning processThe target network ofThe target network ofπ(sk|θπ) Target network of(s) is pi'(s)k|θπ′)。
Updating a state-action value network by a gradient descent methodParameter (d) ofAs shown in equation (16):
whereinIs composed ofThe learning rate of (a) is determined,to representNetwork structure parameters, loss functionsAs shown in equation (17):
wherein, a'k+1=ak+1+ ε, ε -clip (N (0, σ), - κ, κ), clip (·) denotes the clipping function, N denotes the mean 0, the variance σ Gaussian noise, κ denotes the clipping parameter, γ denotes the discounting factor, X denotes the sample set randomly sampled from E, X ═ { X ·k},xk=[sk,ak,rk,sk+1]。
Action policy network pi(s)k|θπ) Network parameter θ ofπThe update is shown in equation (18):
wherein muπIs pi(s)k|θπ) Learning rate of thetaπDenotes pi(s)k|θπ) Network structure parameter,. pi(s)k|θπ) Of (2) a gradient of the strategyAs shown in equation (19):
target networkAnd pi'(s)k|θπ′) Network parameter inAnd thetaπ′The updating is shown in the formulas (20) and (21), wherein, the factor is updated
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210079544.6A CN114513814A (en) | 2022-01-24 | 2022-01-24 | Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210079544.6A CN114513814A (en) | 2022-01-24 | 2022-01-24 | Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114513814A true CN114513814A (en) | 2022-05-17 |
Family
ID=81549326
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210079544.6A Pending CN114513814A (en) | 2022-01-24 | 2022-01-24 | Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114513814A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116257361A (en) * | 2023-03-15 | 2023-06-13 | 北京信息科技大学 | Unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method |
-
2022
- 2022-01-24 CN CN202210079544.6A patent/CN114513814A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116257361A (en) * | 2023-03-15 | 2023-06-13 | 北京信息科技大学 | Unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method |
CN116257361B (en) * | 2023-03-15 | 2023-11-10 | 北京信息科技大学 | Unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111800828B (en) | Mobile edge computing resource allocation method for ultra-dense network | |
CN111556461B (en) | Vehicle-mounted edge network task distribution and unloading method based on deep Q network | |
CN111132077A (en) | Multi-access edge computing task unloading method based on D2D in Internet of vehicles environment | |
Chen et al. | Efficiency and fairness oriented dynamic task offloading in internet of vehicles | |
CN112543049B (en) | Energy efficiency optimization method and device of integrated ground satellite network | |
CN112911648A (en) | Air-ground combined mobile edge calculation unloading optimization method | |
CN113395654A (en) | Method for task unloading and resource allocation of multiple unmanned aerial vehicles of edge computing system | |
CN114051254B (en) | Green cloud edge collaborative computing unloading method based on star-ground fusion network | |
CN113543074A (en) | Joint computing migration and resource allocation method based on vehicle-road cloud cooperation | |
CN113613301B (en) | Air-ground integrated network intelligent switching method based on DQN | |
WO2022242468A1 (en) | Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium | |
CN113282352A (en) | Energy-saving unloading method based on multi-unmanned aerial vehicle cooperative auxiliary edge calculation | |
Zhou et al. | Dynamic channel allocation for multi-UAVs: A deep reinforcement learning approach | |
CN116887355A (en) | Multi-unmanned aerial vehicle fair collaboration and task unloading optimization method and system | |
CN114785397A (en) | Unmanned aerial vehicle base station control method, flight trajectory optimization model construction and training method | |
CN116321293A (en) | Edge computing unloading and resource allocation method based on multi-agent reinforcement learning | |
CN116112981A (en) | Unmanned aerial vehicle task unloading method based on edge calculation | |
CN114513814A (en) | Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node | |
Li et al. | DNN partition and offloading strategy with improved particle swarm genetic algorithm in VEC | |
CN112911618B (en) | Unmanned aerial vehicle server task unloading scheduling method based on resource exit scene | |
CN114520991B (en) | Unmanned aerial vehicle cluster-based edge network self-adaptive deployment method | |
CN111930435B (en) | Task unloading decision method based on PD-BPSO technology | |
Zhou et al. | Improved artificial bee colony algorithm-based channel allocation scheme in low earth orbit satellite downlinks | |
CN116208968B (en) | Track planning method and device based on federal learning | |
CN116321181A (en) | Online track and resource optimization method for multi-unmanned aerial vehicle auxiliary edge calculation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |