CN114698125A - Method, device and system for optimizing computation offload of mobile edge computing network - Google Patents
Method, device and system for optimizing computation offload of mobile edge computing network Download PDFInfo
- Publication number
- CN114698125A CN114698125A CN202210619336.0A CN202210619336A CN114698125A CN 114698125 A CN114698125 A CN 114698125A CN 202210619336 A CN202210619336 A CN 202210619336A CN 114698125 A CN114698125 A CN 114698125A
- Authority
- CN
- China
- Prior art keywords
- model
- mobile
- reward
- determining
- edge computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000006870 function Effects 0.000 claims abstract description 87
- 238000005457 optimization Methods 0.000 claims abstract description 81
- 238000012549 training Methods 0.000 claims abstract description 55
- 238000004364 calculation method Methods 0.000 claims abstract description 37
- 230000002787 reinforcement Effects 0.000 claims abstract description 36
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 230000008901 benefit Effects 0.000 claims abstract description 18
- 238000005070 sampling Methods 0.000 claims abstract description 16
- 230000009471 action Effects 0.000 claims description 24
- 230000005540 biological transmission Effects 0.000 claims description 11
- 238000004891 communication Methods 0.000 claims description 10
- 239000000126 substance Substances 0.000 claims description 8
- 230000009467 reduction Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims description 2
- 150000001875 compounds Chemical class 0.000 description 19
- 238000005265 energy consumption Methods 0.000 description 13
- 238000013468 resource allocation Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000013480 data collection Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/53—Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44594—Unloading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention provides a calculation unloading optimization method, a device and a system of a mobile edge computing network, which are based on a distributed execution-centralized unloading framework of deep reinforcement learning, reduce the complexity of calculating time for solving the original target optimization problem, and avoid dimension disasters possibly faced by a traditional numerical optimization algorithm in a large-scale heterogeneous mobile edge computing network; by defining a loss function, an advantage function and a multi-agent reinforcement learning algorithm, the data sampling efficiency and the model training speed are improved, the average system cost in a network is reduced, and the service quality of calculation-intensive application is improved.
Description
Technical Field
The invention relates to the technical field of edge computing, in particular to a computing unloading optimization method, a computing unloading optimization device and a computing unloading optimization system of a mobile edge computing network.
Background
With the explosion of computationally intensive mobile applications such as online gaming, autopilot, virtual reality, etc., it is becoming more and more imperative that mobile devices provide low-latency services for these applications. However, mobile devices typically have very limited computational resources and energy reserves, which present significant challenges to meeting the latency and computational requirements of applications. Thanks to the advent of 5G technology, mobile edge computing is considered a promising technology to address the above challenges by offloading computationally intensive-delay sensitive tasks to nearby edge nodes. However, in a conventional MEC (Mobile Edge Computing) system, an Edge server is usually deployed on a ground base station at a fixed location, which is high in deployment cost and poor in flexibility, and is not suitable for accessing scenes with dynamically changing requirements, such as event relay, traffic management, emergency rescue, and the like. Therefore, a heterogeneous mobile edge computing network facing ground vehicles and unmanned aerial vehicle assistance is receiving more and more attention from both academic and industrial circles.
Due to the high flexibility of ground vehicles and unmanned aerial vehicles and the convenience in deployment, the heterogeneous mobile edge computing network can adapt to a rapidly changing network environment and provide services for hot spots or emergency rescue activities as required. However, the requirements of high flexibility and dynamic change also bring difficult problems such as real-time decision, large-scale user association, and resource allocation under strict scheduling constraints to the heterogeneous mobile edge computing network.
In the existing research and invention, part of the method is based on the traditional numerical optimization method, for example, a convex optimization method and a heuristic search algorithm are adopted to solve the problems of task unloading and resource allocation in the multi-server mobile edge computing network, and the computing rate of the wireless power supply mobile edge computing network is maximized based on a coordinate reduction method; another part is a deep learning based approach, such as online incremental learning based on a deep neural network to solve the computational offloading and resource management problems of a dynamic heterogeneous mobile edge computing network.
Although the approximate solution can be obtained based on the traditional numerical optimization method, a large amount of iterations are usually needed to obtain a more ideal local optimal solution, the calculation complexity of problem solution is higher, and the method is not suitable for a dynamically changing environment. Most methods based on deep learning have low data sampling efficiency and the problem of low model convergence speed.
Disclosure of Invention
In order to solve the above problem, an embodiment of the present invention provides a computation offload optimization method for a mobile edge computing network, where the mobile edge computing network includes a ground vehicle and an unmanned aerial vehicle, and the method includes: constructing a system model of the moving edge computing network, and determining an optimization objective function of the model based on minimizing an average system cost; converting the optimization objective function based on the average system cost minimization into an optimization objective function based on the average reward maximization according to the state, action and reward elements of a Markov decision model; determining a distributed execution and centralized training framework of multi-agent deep reinforcement learning, and determining a loss function and an advantage function of training; training of the system model is performed according to a multi-agent reinforcement learning algorithm.
Optionally, the building a system model of the mobile edge computing network includes: establishing a network model comprising a plurality of ground vehicles, unmanned aerial vehicles and mobile equipment; establishing a communication model according to the network model, wherein the communication model comprises a mobile device-ground vehicle channel model and a mobile device-unmanned aerial vehicle channel model; and establishing a calculation model according to the communication model, wherein the calculation model comprises the calculation of local calculation cost, ground vehicle edge calculation cost and unmanned aerial vehicle edge calculation cost.
Optionally, the determining an optimization objective function of the model based on average system cost minimization comprises: determining an average system cost of all mobile devices in a plurality of time slices according to the local calculated cost, the ground vehicle edge calculated cost and the unmanned aerial vehicle edge calculated cost; and simultaneously unloading decision variables of the mobile equipment, so that the average system cost is minimum to obtain an optimized objective function.
Optionally, the transforming the optimization objective function based on the average system cost reduction into the optimization objective function based on the average reward maximization according to the state, action and reward elements of the markov decision model includes: determining the track of the mobile equipment in a plurality of time slices according to the state, action and reward elements of the Markov decision model, and calculating the probability of the track and the total reward; the state comprises task information, channel state and electric quantity information of the mobile equipment, and the action comprises unloading indication, transmission power and distributed computing capacity of the mobile equipment; calculating an average reward according to the probability of the track occurrence and the total reward, and determining an optimization objective function based on the maximization of the average reward.
Optionally, the determining a distributed execution and centralized training framework of multi-agent deep reinforcement learning, and determining a loss function and a merit function of training, comprises: constructing a distributed execution and centralized training framework of multi-agent deep reinforcement learning based on an Actor-Critic algorithm; determining a merit function using the generalized merit estimate in place of the total reward, and determining a penalty function using the offline policy in place of the online policy.
Optionally, the performing training of the system model according to a multi-agent reinforcement learning algorithm comprises: each mobile device interacts with the mobile edge computing network based on the observed local state to generate batch learning experience; training a sharing strategy based on the batch learning experience according to the generalized advantage estimation and the importance sampling; and each mobile device shares the sharing strategy to interact with the mobile edge computing network.
The embodiment of the invention provides a calculation unloading optimization device of a mobile edge calculation network, wherein the mobile edge calculation network comprises a ground vehicle and an unmanned aerial vehicle, and the device comprises: a model construction module for constructing a system model of the moving edge computing network and determining an optimization objective function of the model based on an average system cost minimization; the Markov decision conversion module is used for converting the optimization objective function based on the average system cost reduction into the optimization objective function based on the average reward maximization according to the state, the action and the reward elements of the Markov decision model; the system comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining a distributed execution and centralized training framework of multi-agent deep reinforcement learning and determining a loss function and an advantage function of training; a training module for performing training of the system model according to a multi-agent reinforcement learning algorithm.
The embodiment of the invention provides a computing unloading optimization system of a mobile edge computing network, which is used for executing the computing unloading optimization method of the mobile edge computing network.
The embodiment of the invention is based on a distributed execution-centralized unloading framework of deep reinforcement learning, reduces the complexity of computing time for solving the original target optimization problem, and avoids dimension disasters possibly faced by the traditional numerical optimization algorithm in a large-scale heterogeneous mobile edge computing network; by defining a loss function, an advantage function and a multi-agent reinforcement learning algorithm, the data sampling efficiency and the model training speed are improved, the average system cost in a network is reduced, and the service quality of calculation-intensive application is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flowchart of a computation offload optimization method for a mobile edge computing network according to an embodiment of the present invention;
FIG. 2 is a system model diagram of a heterogeneous mobile edge computing network according to an embodiment of the present invention;
FIG. 3 is a diagram of a distributed execution-centralized training framework provided by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computation offload optimization apparatus of a mobile edge computing network according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to solve the problems of task unloading and resource allocation of a ground vehicle and unmanned aerial vehicle assisted heterogeneous mobile edge computing network, the embodiment of the invention provides a computation unloading optimization algorithm based on deep reinforcement learning aiming at a multi-user multi-edge node scene.
Referring to fig. 1, a flow diagram of a computation offload optimization method for a mobile edge computing network is shown, where the mobile edge computing network includes a ground vehicle and an unmanned aerial vehicle, and the method includes the following steps:
s102, a system model of the mobile edge computing network is constructed, and an optimization objective function of the model based on average system cost minimization is determined.
The mobile edge computing network comprises a plurality of ground vehicles, unmanned planes and mobile equipment.
Illustratively, a system model of a mobile edge computing network may be constructed in the following manner, including: firstly, establishing a network model comprising a plurality of ground vehicles, unmanned aerial vehicles and mobile equipment; secondly, establishing a communication model according to the network model, wherein the communication model can comprise a mobile device-ground vehicle channel model and a mobile device-unmanned aerial vehicle channel model, and each channel model comprises channel gain and unloading transmission rate; then, a calculation model is established according to the communication model, wherein the calculation model can comprise the calculation of local calculation cost, ground vehicle edge calculation cost and unmanned aerial vehicle edge calculation cost, and each calculation cost comprises the delay and the energy consumption for executing tasks.
Illustratively, the optimization objective function of the above model based on the minimization of the average system cost may be determined in the following manner, including: firstly, determining the average system cost of all mobile equipment in a plurality of time slices according to the local calculation cost, the ground vehicle edge calculation cost and the unmanned aerial vehicle edge calculation cost; and secondly, unloading decision variables of the mobile equipment are connected, so that the average system cost is minimized to obtain an optimized objective function. The offload decision variables include: variables of the execution position of the decision task (local, ground vehicle and unmanned aerial vehicle), the transmitting power and the computing resources of the local, ground vehicle and unmanned aerial vehicle.
And S104, converting the optimization objective function based on the average system cost reduction into the optimization objective function based on the average reward maximization according to the state, the action and the reward elements of the Markov decision model.
Three elements, namely states, actions and rewards based on a Markov decision model are defined for the optimization problem in the steps and are converted into an optimization objective function based on the maximization of average rewards.
Due to the high coupling between offload decision variables and several limiting factors in the system, the optimization problem is an NP-hard (non-deterministic polynomial) problem, and thus the conventional numerical optimization method usually faces problems of high computational time complexity and dimensional disaster.
In order to avoid the above problems, the embodiment of the present invention defines three elements based on the markov decision model as follows: status, action, reward. Specifically, the state of each mobile device includes its task information, channel state, and power information; the actions of each mobile device include offloading indications, transmission power, and allocated computing power; the average system cost is minimized by optimizing the unloading decision and the resource allocation in the embodiment of the invention.
Based on this, the above steps may include: determining the tracks of the mobile equipment in a plurality of time slices according to the state, the action and the reward elements of the Markov decision model, and calculating the probability of track occurrence and the total reward; then, an average reward is calculated according to the probability of the track occurrence and the total reward, and an optimization objective function based on the maximization of the average reward is determined.
And S106, determining a distributed execution and centralized training framework of the multi-agent deep reinforcement learning, and determining a loss function and an advantage function of the training.
In the embodiment of the invention, a distributed execution and centralized training framework based on deep reinforcement learning is designed, and a loss function and an advantage function of training are determined. And aiming at the optimization problem based on the maximized average reward, deep reinforcement learning is adopted to train the network model. In view of the needs of large-scale user association and real-time decision making, the embodiment of the invention can adopt an Actor-Critic (Actor-evaluator) -based algorithm to build a distributed execution and centralized training framework for computation offloading and resource allocation scheduling.
The embodiment of the invention uses the deep reinforcement learning algorithm suitable for the multi-agent, and greatly reduces the computational complexity of problem solving through distributed execution and centralized training.
Optionally, a distributed execution and centralized training framework of multi-agent deep reinforcement learning can be built based on an Actor-Critic algorithm; then, a merit function is determined using the generalized merit estimate in place of the total reward, and a penalty function is determined using the offline policy in place of the online policy. The Actor may be responsible for generating actions and interacting with the environment based on an offline policy function; critic uses a device responsible for assessing the performance of Actor and directing the action of Actor at the next stage.
And S108, training the system model according to the multi-agent reinforcement learning algorithm.
Based on the above steps, the embodiment of the present invention can be implemented by using a reinforcement learning method suitable for Multi-agents, such as shared Multi-Agent proximity Policy Optimization (smapp), Multi-Agent Deep Deterministic Policy Gradient algorithm (maddppg), QMix algorithm, and the like.
Illustratively, the embodiment of the present invention employs SMAPPO based on centralized training and distributed execution framework, and the whole framework can be divided into three parts: distributed execution, data collection, and centralized training.
Based on the centralized training and distributed execution framework described above, the training process may be performed in the following manner: each mobile device interacts with a mobile edge computing network based on the observed local state to generate batch learning experience; training a sharing strategy based on batch learning experience according to generalized advantage estimation and importance sampling; and each mobile device shares the sharing strategy to interact with the mobile edge computing network. The utilization rate of experience and the convergence speed of the model can be further improved by using a near-end optimization algorithm introducing a merit function and importance sampling.
The calculation unloading optimization method of the mobile edge computing network provided by the embodiment of the invention is based on a distributed execution-centralized unloading framework of deep reinforcement learning, reduces the complexity of calculating time for solving the original target optimization problem, and avoids dimension disasters possibly faced by a traditional numerical optimization algorithm in a large-scale heterogeneous mobile edge computing network; by defining a loss function, an advantage function and a multi-agent reinforcement learning algorithm, the data sampling efficiency and the model training speed are improved, the average system cost in a network is reduced, and the service quality of calculation-intensive application is improved.
Furthermore, the method can solve the problems of computation unloading and resource allocation in the heterogeneous mobile edge computing network assisted by ground vehicles and unmanned aerial vehicles, and through the distributed execution-centralized unloading framework based on deep reinforcement learning in the step S104 and the step S106, the computation time complexity for solving the original target optimization problem is reduced, and the dimension disaster possibly faced by the traditional numerical optimization algorithm in the large-scale heterogeneous mobile edge computing network is avoided; by means of the loss function and the advantage function defined in the step S106, the importance sampling introduced in the step S108 and other methods, the data sampling efficiency and the model training speed are improved, the average system cost in the network is greatly reduced, and the service quality of the calculation-intensive application is improved.
Exemplary processes of the above steps are described in detail below.
(1) And constructing a system model of the ground vehicle and unmanned aerial vehicle assisted heterogeneous mobile edge computing network and giving an optimization objective function based on average system cost minimization.
The method comprises the following steps of constructing a system model of a ground vehicle and unmanned aerial vehicle assisted heterogeneous mobile edge computing network:
1. establishing a network model
Referring to fig. 2, a system model diagram of a heterogeneous mobile edge computing network is shown, wherein the system model diagram is included in a vehicle-oriented and unmanned aerial vehicle-oriented auxiliary mobile edge computing networkMA mobile device,VGround vehicle andUand erecting an unmanned aerial vehicle. The ground vehicle and the drone may be represented as a set, respectivelyV={1,2,…, VThe sum setU={1,2,…, U}. The mobile devices are randomly distributed over the ground and may be represented as a setM={1,2,…, M}. The overall system time is divided equally intoNA time slice, expressed as a setN={1,2,…, N}. Mobile deviceiIn time slicenWill randomly generate a taskExpressed as:
in the formula (I), the compound is shown in the specification,represents the input data size;indicating the number of clock cycles required to complete a 1-bit task;indicating completion of a taskIs determined.
In this embodiment, a full offloading strategy is adopted, i.e. the generated tasks are either executed locally on the mobile device or completely offloaded to an edge node (i.e. a ground vehicle or a drone) for remote execution. Mobile deviceiIn time slicenMay be made of variablesExpressed, defined as follows:
in the formula (I), the compound is shown in the specification,representing a local computation;representing a ground vehicle edge calculation;representing unmanned aerial vehicle edge calculations.
2. Establishing a communication model
1) Mobile device-ground vehicle channel model
Mobile deviceiAnd a land vehiclejIn time slicenThe channel gain of (d) is expressed as:
in the formula (I), the compound is shown in the specification,representing mobile devicesiAnd a land vehiclejIn time slicenThe distance between, expressed as:
in the formula (I), the compound is shown in the specification,representing mobile devicesiIn time slicenAt the position of (1) in coordinates of;Indicating land vehiclesjIn time slicenAt the position of (1) in coordinates of。
According to the Shannon formula, the mobile deviceiAnd a land vehiclejIn time slicenThe offload transfer rate in between can be expressed as:
in the formula (I), the compound is shown in the specification,representing mobile devicesiIn time slicenThe transmit power of (a);W j representing mobile devicesiAnd a land vehiclejThe bandwidth of the channel in between the two,σ 2 representing the amount of noise power in the channel.
2) Mobile device-unmanned aerial vehicle channel model
Mobile deviceiAnd unmanned aerial vehiclekIn time slicenThe channel gain of (d) is expressed as:
in the formula (I), the compound is shown in the specification,ζ LOS andζ NLOS respectively represent the excess loss of line-of-sight and non-line-of-sight links,representing mobile devicesiAnd unmanned aerial vehiclekIn time slicenIs calculated as:
in the formula (I), the compound is shown in the specification,indicating unmanned aerial vehiclekIn time slicenOf the position of (a).
According to the Shannon formula, the mobile deviceiAnd unmanned aerial vehiclekIn time slicenThe offload transfer rate in between can be expressed as:
in the formula (I), the compound is shown in the specification,representing mobile devicesiAnd unmanned aerial vehiclekThe channel bandwidth in between.
3. Building a computational model
1) Local calculation: when in useThe task is executed locally. The latency of executing a task locally is expressed as:
in the formula (I), the compound is shown in the specification,is a mobile deviceiIn time slicenThe local computing resources of (a). The delay should satisfy the following condition:
accordingly, the locally calculated energy consumption may be expressed as:
in the formula (I), the compound is shown in the specification,κeffective switched capacitance depending on chip architecture, ζ represents the energy consumption index, in accordance withExperience typically takes ζ = 3. In summary, the locally computed weighted cost can be expressed as:
in the formula (I), the compound is shown in the specification,andrespectively representing locally calculated delay weight and energy consumption weight, and having, ,。
2) Ground vehicle edge calculation: when in useOff-loading of tasks to ground vehiclesjIs executed. Offloading tasks to ground vehiclesjThe transmission delay of (d) may be expressed as:
accordingly, the transmission energy consumption can be expressed as:
vehicle for working on groundjThe calculated delay of (d) can be expressed as:
accordingly, the calculated energy consumption may be expressed as:
in the formula (I), the compound is shown in the specification,indicating land vehiclesjThe operating power of (c). In overview, the weighted cost computed for the edge of the ground vehicle can be expressed as:
in the formula (I), the compound is shown in the specification,andrespectively representing delay weight and energy consumption weight calculated at edge of ground vehicle, and having, ,。
3) And (3) unmanned plane edge calculation: when in useWith tasks off-loaded to the dronekIs executed. Offloading tasks to unmanned aerial vehiclekThe transmission delay of (d) can be expressed as:
accordingly, the transmission energy consumption can be expressed as:
task is at unmanned aerial vehiclekThe calculated delay of (d) can be expressed as:
accordingly, the calculated energy consumption may be expressed as:
in the formula (I), the compound is shown in the specification,indicating unmanned aerial vehiclekThe operating power of (c). To sum up, the weighted cost computed by the drone edge may be expressed as:
in the formula (I), the compound is shown in the specification,andrespectively represent the delay weight and the energy consumption weight of the edge calculation of the unmanned plane, and have, ,。
From the above, the mobile deviceiIn time slicenThe system cost of (a) may be expressed as:
thus, all mobile devices are inNThe average system cost in a time slice can be expressed as:
based on the established system model, unloading decision variables of all mobile devices are optimized in a combined modeMinimizing the average system cost of the ground vehicle and drone assisted mobile edge computing network, thus said optimizing the objective functionPComprises the following steps:
in the formula (I), the compound is shown in the specification,C1 is an unload index constraint;C2 is the transmission power constraint;C3、C4 andC5 represent the assigned computational capability constraints of the mobile device, the ground vehicle and the drone, respectively;C6、C7、C8 means that the delay to complete a task should not be greater than its maximum tolerable delay;C9 denotes that the total energy consumption of the mobile device should be less than the maximum available energy budget of the mobile device from the start time to the current time;C10 indicates that the total energy consumption of the ground vehicle from the start time to the current time should be within the maximum available energy budget for the ground vehicle;C11 means that the total energy consumption from the start time to the current time after the drone is unloaded should not be greater than the maximum available energy budget for the drone.
And 2, defining three elements, namely states, actions and rewards based on a Markov decision model for the optimization problem in the step 1, and converting the three elements into an optimization objective function based on average reward maximization.
Due to offloading decision variablesThe optimization problem is an NP-hard problem, and thus, the conventional numerical optimization method usually faces problems of high computational time complexity and dimensional disaster. In order to avoid the above problems, the present invention defines three elements based on a markov decision model as follows:
status. The status of each mobile device includes its task information, channel status and power information. Thus, the mobile deviceiIn time slicenThe state of (c) can be expressed as:
in the formula (I), the compound is shown in the specification,representing mobile devicesiIn time slicenThe current remaining capacity of electricity.
And (6) acting. The actions of each mobile device include offloading indications, transmission power, and allocated computing power. Mobile deviceiIn time slicenThe actions of (a) may be expressed as:
and (6) awarding. The average system cost is minimized by optimizing the offloading decisions and resource allocation. Thus, the mobile deviceiIn time slicenThe reward of (a) may be expressed as:
in the formula (I), the compound is shown in the specification,representing mobile devicesiIn time slicenWeighted cost of (2).
Based on the definition of the three elements in the Markov decision model, the mobile deviceiIn thatNThe trajectory of each time slice may be represented as:
accordingly, the probability of a track occurrence and the total reward may be expressed as:
in the formula (I), the compound is shown in the specification,θis a network parameter of the Actor,indicating a stateThe probability of occurrence.
The average reward may be expressed as:
thus, the original optimization problem can be translated into an optimization objective function based on maximizing the average reward, as follows:
and 3, designing a distributed execution and centralized training framework based on deep reinforcement learning aiming at the Markov decision problem in the step 2, and determining a loss function and an advantage function of training.
For problem P1, the present embodiment employs deep reinforcement learning to train the network model. In view of the needs of large-scale user association and real-time decision, a distributed execution and centralized training framework is built for computation unloading and resource allocation scheduling by adopting an Actor-Critic-based algorithm. For the above optimization problem, the gradient of the objective function can be expressed as:
wherein the content of the first and second substances,Bis the small batch size per sample. To add a benchmark and add a suitable confidence level, the present embodiment introduces a generalized dominance estimate instead of the total reward. The merit function is defined as follows:
wherein the content of the first and second substances,indicating a stateIn the form of a desired reward for the user,γa discount factor that represents a future reward,mobile deviceiIn time slicen ’ The prize of (1). Thus, the gradientIt can become:
in order to improve the efficiency of data sampling, we choose to omit the replacement online policy using the offline policy, and the loss function of the Actor can be expressed as:
wherein, the first and the second end of the pipe are connected with each other,θ ’ is an Actor network parameter on each mobile device,θis the Actor network parameter to be trained,εrepresenting the clipping factor (a fraction between 0 and 1),cliprepresenting the clipping function, the function is defined as follows:
furthermore, the loss function of Critic can be expressed as:
and 4, defining a shared multi-agent near-end strategy optimization algorithm and an execution process aiming at the distributed execution and centralized training framework in the step 3.
On the basis of step 3, a shared multi-agent near-end optimization algorithm based on centralized training and distributed execution frameworks is provided.
Referring to the schematic diagram of the distributed execution-centralized training framework shown in fig. 3, the entire framework can be divided into three parts from bottom to top: distributed execution, data collection, and centralized training.
(1) Firstly, each user equipment interacts with a heterogeneous mobile edge computing network based on local states observed by the user equipment, and batch learning experience is generated.
(2) These learning experiences are then used to train a shared strategy and value function by employing generalized dominance estimates and importance sampling.
(3) Finally, each mobile device shares the trained policy continuation and context interactions.
Illustratively, the shared multi-agent near-end policy optimization is performed as follows:
1: initializing ActorπAnd CriticVUsage parameterθ ’ ←θAndΦ ’ ←Φ(ii) a And initializing an experience pool.
2: for the rounde=1ToERepeatedly perform
3: for time slicen=1ToNRepeatedly perform
4: for mobile devicesi=1ToMRepeatedly perform
6: end the cycle
7: end the cycle
8: for the number of updatest=1ToTRepeatedly perform
9: for the number of sampling timess=1ToSRepeatedly perform
10: random selectionBA tuple of experiences
11: computing dominant function, computing Actor loss and criticic loss
12: gradient computation with gradient descent by Adam optimizer▽θAnd▽Φ;
13: update ActorπAnd CriticVUsage parameterθ ’ ←θAndΦ ’ ←Φ;
14: end the cycle
15: end the cycle
16: end the cycle
The embodiment of the invention can solve the problems of computation unloading and resource allocation in a ground vehicle and unmanned aerial vehicle assisted heterogeneous mobile edge computing network, and through the distributed execution-centralized unloading framework based on deep reinforcement learning in the step 2 and the step 3, the computation time complexity for solving the original target optimization problem is reduced, and the dimension disaster possibly faced by the traditional numerical optimization algorithm in a large-scale heterogeneous mobile edge computing network is avoided. By means of the loss function and the advantage function defined in the step 3, the importance sampling method introduced in the step 4 and the like, the data sampling efficiency and the model training speed are improved, the average system cost in the network is greatly reduced, and the service quality of the calculation intensive application is improved.
Fig. 4 is a schematic structural diagram of a computation offload optimization apparatus of a mobile edge computing network in an embodiment of the present invention, where the mobile edge computing network includes a ground vehicle and an unmanned aerial vehicle, and the apparatus includes:
a model building module 401, configured to build a system model of the moving edge computing network, and determine an optimization objective function of the model based on an average system cost minimization;
a markov decision transformation module 402, configured to transform the optimization objective function based on average system cost reduction into an optimization objective function based on average reward maximization according to the state, action and reward elements of a markov decision model;
a determining module 403, configured to determine a distributed execution and centralized training framework of multi-agent deep reinforcement learning, and determine a loss function and an advantage function of training;
a training module 404 for performing training of the system model according to a multi-agent reinforcement learning algorithm.
The embodiment of the invention is based on a distributed execution-centralized unloading framework of deep reinforcement learning, reduces the complexity of computing time for solving the original target optimization problem, and avoids dimension disasters possibly faced by the traditional numerical optimization algorithm in a large-scale heterogeneous mobile edge computing network; by defining a loss function, an advantage function and a multi-agent reinforcement learning algorithm, the data sampling efficiency and the model training speed are improved, the average system cost in a network is reduced, and the service quality of calculation-intensive application is improved.
The embodiment of the invention provides a computing unloading optimization system of a mobile edge computing network, which is used for executing the computing unloading optimization method of the mobile edge computing network.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by instructing a control device to implement the methods, and the programs may be stored in a computer-readable storage medium, and when executed, the programs may include the processes of the above method embodiments, where the storage medium may be a memory, a magnetic disk, an optical disk, and the like.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method for optimizing computation offload of a mobile edge computing network, wherein the mobile edge computing network comprises a ground vehicle and an unmanned aerial vehicle, and the method comprises the following steps:
constructing a system model of the moving edge computing network, and determining an optimization objective function of the model based on minimizing an average system cost;
converting the optimization objective function based on the average system cost minimization into an optimization objective function based on the average reward maximization according to the state, action and reward elements of a Markov decision model;
determining a distributed execution and centralized training framework of multi-agent deep reinforcement learning, and determining a loss function and an advantage function of training;
performing training of the system model according to a multi-agent reinforcement learning algorithm.
2. The method of claim 1, wherein constructing the system model of the mobile edge computing network comprises:
establishing a network model comprising a plurality of ground vehicles, unmanned aerial vehicles and mobile equipment;
establishing a communication model according to the network model, wherein the communication model comprises a mobile device-ground vehicle channel model and a mobile device-unmanned aerial vehicle channel model;
and establishing a calculation model according to the communication model, wherein the calculation model comprises the calculation of local calculation cost, ground vehicle edge calculation cost and unmanned aerial vehicle edge calculation cost.
3. The method of claim 2, wherein determining an optimization objective function for the model based on an average system cost minimization comprises:
determining an average system cost of all mobile devices in a plurality of time slices according to the local calculated cost, the ground vehicle edge calculated cost and the unmanned aerial vehicle edge calculated cost;
and simultaneously unloading decision variables of the mobile equipment, so that the average system cost is minimum to obtain an optimized objective function.
4. The method of claim 1, wherein transforming the optimization objective function based on mean system cost reduction into an optimization objective function based on mean reward maximization according to state, action and reward elements of a Markov decision model comprises:
determining the track of the mobile equipment in a plurality of time slices according to the state, action and reward elements of the Markov decision model, and calculating the probability of the track and the total reward; the state comprises task information, channel state and electric quantity information of the mobile equipment, and the action comprises unloading indication, transmission power and distributed computing capacity of the mobile equipment;
calculating an average reward according to the probability of the track occurrence and the total reward, and determining an optimization objective function based on the maximization of the average reward.
5. The method of any of claims 1-4, wherein determining a distributed execution and centralized training framework for multi-agent deep reinforcement learning, and determining a loss function and a merit function for training, comprises:
constructing a distributed execution and centralized training framework of multi-agent deep reinforcement learning based on an Actor-Critic algorithm;
determining a merit function using the generalized merit estimate in place of the total reward, and determining a penalty function using the offline policy in place of the online policy.
6. The method of any one of claims 1-4, wherein said performing training of said system model according to a multi-agent reinforcement learning algorithm comprises:
each mobile device interacts with the mobile edge computing network based on the observed local state to generate batch learning experience;
training a sharing strategy based on the batch learning experience according to generalized advantage estimation and importance sampling;
and each mobile device shares the sharing strategy to interact with the mobile edge computing network.
7. The method of claim 4, wherein the mobile device is a mobile deviceiIn time slicenState of (1)Expressed as:
wherein the content of the first and second substances,which is indicative of the size of the input data,indicating the number of clock cycles required to complete a 1-bit task,indicating completion of a taskIs determined by the maximum allowable delay of the delay,representing mobile devicesiIn time slicenThe current amount of remaining power of the battery,representing mobile devicesAnd a land vehicleIn time sliceThe channel gain of (a) is determined,representing mobile devicesAnd unmanned aerial vehiclekIn time sliceThe channel gain of (a);
mobile deviceiIn time slicenIs represented as:
wherein the content of the first and second substances,representing mobile devicesIn time sliceThe load-shedding decision variable of (a) is,representing mobile devicesIn time sliceThe transmission power of (a) is set,representing mobile devicesIn time sliceThe local computing resources of (a) are,representing mobile devicesIn time sliceThe ground vehicle computing resources of (a) are,representing mobile devicesIn time sliceThe unmanned aerial vehicle computing resources of (1);
mobile deviceiIn time slicenThe reward of (a) is expressed as:
wherein the content of the first and second substances,for mobile devicesIn time sliceThe system cost of (a);
mobile deviceiIn thatNThe trace for each time slice is represented as:
the probability of occurrence of a track and the total reward are expressed as:
wherein, the first and the second end of the pipe are connected with each other,indicating a stateThe probability of the occurrence of the event is,a network parameter representing Actor;
the average reward is expressed as:
wherein the content of the first and second substances,Eindicates a desire;
the optimization objective function based on maximizing the average reward is expressed as:
8. the method of claim 7, wherein the mobile device is a mobile deviceiIn time slicenThe state of (a) is represented as:
the merit function is expressed as:
wherein the content of the first and second substances,indicating a stateIn the form of a desired reward of (a),γa discount factor that represents a future reward,mobile deviceiIn time sliceThe reward of (1);
the loss function of Actor is expressed as:
wherein the content of the first and second substances,indicates the Actor network parameter on each mobile device,indicates the Actor network parameters to be trained,the calculation formula of (c) is as follows:
the loss function for Critic is expressed as:
9. a computational offload optimization apparatus for a mobile edge computing network, the mobile edge computing network comprising a ground vehicle, a drone, the apparatus comprising:
a model construction module for constructing a system model of the moving edge computing network and determining an optimization objective function of the model based on an average system cost minimization;
the Markov decision conversion module is used for converting the optimization objective function based on the average system cost reduction into an optimization objective function based on the average reward maximization according to the state, the action and the reward elements of the Markov decision model;
the system comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining a distributed execution and centralized training framework of multi-agent deep reinforcement learning and determining a loss function and an advantage function of training;
a training module for performing training of the system model according to a multi-agent reinforcement learning algorithm.
10. A computing offload optimization system of a mobile edge computing network, the system being configured to perform the computing offload optimization method of the mobile edge computing network according to any of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210619336.0A CN114698125A (en) | 2022-06-02 | 2022-06-02 | Method, device and system for optimizing computation offload of mobile edge computing network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210619336.0A CN114698125A (en) | 2022-06-02 | 2022-06-02 | Method, device and system for optimizing computation offload of mobile edge computing network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114698125A true CN114698125A (en) | 2022-07-01 |
Family
ID=82131080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210619336.0A Pending CN114698125A (en) | 2022-06-02 | 2022-06-02 | Method, device and system for optimizing computation offload of mobile edge computing network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114698125A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115499440A (en) * | 2022-09-14 | 2022-12-20 | 广西大学 | Server-free edge task unloading method based on experience sharing deep reinforcement learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200186964A1 (en) * | 2018-12-07 | 2020-06-11 | T-Mobile Usa, Inc. | Uav supported vehicle-to-vehicle communication |
CN112118601A (en) * | 2020-08-18 | 2020-12-22 | 西北工业大学 | Method for reducing task unloading delay of 6G digital twin edge computing network |
CN112929849A (en) * | 2021-01-27 | 2021-06-08 | 南京航空航天大学 | Reliable vehicle-mounted edge calculation unloading method based on reinforcement learning |
CN113346944A (en) * | 2021-06-28 | 2021-09-03 | 上海交通大学 | Time delay minimization calculation task unloading method and system in air-space-ground integrated network |
CN114116061A (en) * | 2021-11-26 | 2022-03-01 | 内蒙古大学 | Workflow task unloading method and system in mobile edge computing environment |
CN114169234A (en) * | 2021-11-30 | 2022-03-11 | 广东工业大学 | Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation |
-
2022
- 2022-06-02 CN CN202210619336.0A patent/CN114698125A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200186964A1 (en) * | 2018-12-07 | 2020-06-11 | T-Mobile Usa, Inc. | Uav supported vehicle-to-vehicle communication |
CN112118601A (en) * | 2020-08-18 | 2020-12-22 | 西北工业大学 | Method for reducing task unloading delay of 6G digital twin edge computing network |
CN112929849A (en) * | 2021-01-27 | 2021-06-08 | 南京航空航天大学 | Reliable vehicle-mounted edge calculation unloading method based on reinforcement learning |
CN113346944A (en) * | 2021-06-28 | 2021-09-03 | 上海交通大学 | Time delay minimization calculation task unloading method and system in air-space-ground integrated network |
CN114116061A (en) * | 2021-11-26 | 2022-03-01 | 内蒙古大学 | Workflow task unloading method and system in mobile edge computing environment |
CN114169234A (en) * | 2021-11-30 | 2022-03-11 | 广东工业大学 | Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation |
Non-Patent Citations (2)
Title |
---|
JINNA,CHEN等: "UAV-Assisted Vehicular Edge Computing forUAV-Assisted Vehicular Edge Computing for the 6G Internet of Vehicles: Architecture,Intelligence, and Challenges", 《IEEE COMMUNICATIONS STANDARDS MAGAZINE》 * |
王云鹏: "基于深度强化学习的移动边缘计算的资源优化方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115499440A (en) * | 2022-09-14 | 2022-12-20 | 广西大学 | Server-free edge task unloading method based on experience sharing deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113543176B (en) | Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance | |
Chen et al. | Efficiency and fairness oriented dynamic task offloading in internet of vehicles | |
Faraci et al. | Fog in the clouds: UAVs to provide edge computing to IoT devices | |
US11831708B2 (en) | Distributed computation offloading method based on computation-network collaboration in stochastic network | |
CN115640131A (en) | Unmanned aerial vehicle auxiliary computing migration method based on depth certainty strategy gradient | |
EP4024212B1 (en) | Method for scheduling inference workloads on edge network resources | |
CN115175217A (en) | Resource allocation and task unloading optimization method based on multiple intelligent agents | |
CN113645637B (en) | Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium | |
Hajiakhondi-Meybodi et al. | Deep reinforcement learning for trustworthy and time-varying connection scheduling in a coupled UAV-based femtocaching architecture | |
CN113573363A (en) | MEC calculation unloading and resource allocation method based on deep reinforcement learning | |
CN117499867A (en) | Method for realizing high-energy-efficiency calculation and unloading through strategy gradient algorithm in multi-unmanned plane auxiliary movement edge calculation | |
CN113946423B (en) | Multi-task edge computing, scheduling and optimizing method based on graph attention network | |
CN116489708A (en) | Meta universe oriented cloud edge end collaborative mobile edge computing task unloading method | |
CN113821346B (en) | Edge computing unloading and resource management method based on deep reinforcement learning | |
CN114698125A (en) | Method, device and system for optimizing computation offload of mobile edge computing network | |
Henna et al. | Distributed and collaborative high-speed inference deep learning for mobile edge with topological dependencies | |
CN115514769B (en) | Satellite elastic Internet resource scheduling method, system, computer equipment and medium | |
CN116455903A (en) | Method for optimizing dependency task unloading in Internet of vehicles by deep reinforcement learning | |
CN116204319A (en) | Yun Bianduan collaborative unloading method and system based on SAC algorithm and task dependency relationship | |
CN114980160A (en) | Unmanned aerial vehicle-assisted terahertz communication network joint optimization method and device | |
CN114217881A (en) | Task unloading method and related device | |
CN111813539B (en) | Priority and collaboration-based edge computing resource allocation method | |
CN117891532B (en) | Terminal energy efficiency optimization unloading method based on attention multi-index sorting | |
Zhang et al. | Cooperative optimisation strategy of computation offloading in multi‐UAVs‐assisted edge computing networks | |
Hevesli et al. | Task Offloading Optimization in Digital Twin Assisted MEC-Enabled Air-Ground IIoT 6 G Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220701 |