CN115529604A - Joint resource allocation and multi-task unloading method based on server cooperation - Google Patents

Joint resource allocation and multi-task unloading method based on server cooperation Download PDF

Info

Publication number
CN115529604A
CN115529604A CN202110705792.2A CN202110705792A CN115529604A CN 115529604 A CN115529604 A CN 115529604A CN 202110705792 A CN202110705792 A CN 202110705792A CN 115529604 A CN115529604 A CN 115529604A
Authority
CN
China
Prior art keywords
task
micro base
base station
time period
mobile device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110705792.2A
Other languages
Chinese (zh)
Inventor
张红霞
杨勇进
王登岳
肖军弼
王琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202110705792.2A priority Critical patent/CN115529604A/en
Publication of CN115529604A publication Critical patent/CN115529604A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0925Management thereof using policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/10Flow control between communication endpoints
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • H04W28/20Negotiating bandwidth
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • H04W28/22Negotiating communication rate

Abstract

The invention discloses a joint resource allocation and multi-task unloading method based on server cooperation. The method comprises the following steps: firstly, establishing a mobile edge computing system model under a multi-mobile device and multi-edge server scene; then designing a multi-edge server cooperation model, a communication model, a calculation model and an energy consumption model; then, a joint optimization problem model of resource allocation and multi-task cooperative unloading and a target function are provided; aiming at an objective function, a combined resource allocation and multi-task cooperative unloading method is designed by combining a genetic algorithm and a deep reinforcement learning algorithm, so that the optimal service quality of a user is pursued within the bearable cost range of a network operator, and the problems of server overload and long-term system performance are solved.

Description

Joint resource allocation and multi-task unloading method based on server cooperation
Technical Field
The invention belongs to the technical field of mobile edge computing, and particularly relates to a joint resource allocation and multi-task unloading method based on server cooperation.
Background
In recent years, computing-intensive and delay-sensitive mobile applications have become widely used, such as online video, real-time gaming, and augmented reality. However, it is difficult to execute these applications locally due to the limited battery life and computing power of smart mobile devices. The mobile edge computing paradigm under densely distributed cellular communication is considered as a promising solution, and the most critical technology of mobile edge computing is computing offloading, which migrates computing tasks to an edge server side for execution, effectively reducing congestion and communication delay of a backbone network compared with cloud computing.
The development of 5G technology has created the pioneer for the rise of mobile edge computing, and small base stations equipped with edge servers are considered as key drivers for mobile edge computing, such as home base stations and enterprise-class small base stations, which have cloud-like computing and storage capabilities. However, small base stations have limited computing and communication resources compared to cloud centers. To implement mobile edge computing in resource-constrained service networks, more research has focused on joint optimization of task offloading and resource allocation.
In order to achieve more reasonable computation offload in resource-constrained mobile edge computation scenarios, the full use of edge servers becomes an urgent problem to be solved. Because the computational tasks that reach edge servers can be highly dynamic and heterogeneous, it is difficult for some overloaded edge servers to consistently provide satisfactory computational services. Therefore, cooperation between multi-edge servers becomes an effective method for solving the problem of inefficient use of servers. For example, an edge server cluster may perform computing tasks by migrating from an overloaded edge server to multiple lightly loaded peer edge servers, providing better service to mobile users.
Since the user's task requests are random in time and space, the immediate performance of the mobile edge computing system cannot be pursued without ignoring long-term performance. However, time-varying communication networks present challenges to ensure long-term performance of the system.
In order to achieve the best user service quality within the range of the commercial cost bearable of the network operator, a joint resource allocation and multi-task unloading method based on server cooperation is necessary.
Disclosure of Invention
The invention aims to provide a joint resource allocation and multi-task unloading method based on server cooperation, which can solve the problem of server overload and realize the optimal user service quality while ensuring the long-term performance of a system.
The technical solution for realizing the purpose of the invention is as follows: a joint resource allocation and multi-task unloading method based on server cooperation comprises the following steps:
step 1, establishing a mobile edge computing system model under the scene of multiple mobile devices and multiple edge servers;
step 2, designing a multi-edge server cooperation model;
step 3, designing a communication model;
step 4, designing a calculation model;
step 5, designing an energy consumption model;
step 6, providing a joint optimization problem model of resource allocation and multi-task cooperative unloading and an objective function;
and 7, aiming at the objective function, designing a combined resource allocation and multi-task cooperative unloading method by combining a genetic algorithm and a deep reinforcement learning algorithm so as to pursue the optimal service quality of the user within the bearable cost range of a network operator and solve the problems of server overload and long-term system performance.
Further, the establishing of the mobile edge computing system model in the multi-mobile device and multi-edge server scene in step 1 is specifically as follows:
setting M densely distributed micro base stations of mobile edge computing system
Figure BDA0003131191270000011
The micro base stations can communicate with each other through a 5G wireless microwave communication link, and each base station is provided with an edge server for providing computing service for mobile equipment; provisioning edge server
Figure BDA0003131191270000012
Has a calculation performance (clock frequency) of f i (ii) a Setting a mobile device set under a mobile edge computing system as
Figure BDA0003131191270000013
Each mobile device can wirelessly communicate with the micro base station closest to the mobile device;
setting the mobile equipment to generate G-type calculation intensive and ultra-low delay calculation tasks, wherein the expression form of the task t belonging to G is k t ={d t ,c tt In which d is t Defined as the data size (bits), c, of the computational task t Defined as the CPU cycles, β, required to execute each bit of task data t (∑ t∈G β t = 1) is defined as the probability that the mobile device generates the task t; for better reality, the set time is discretized into a plurality of continuous time periods, and each mobile device generates a calculation task in each time period, for example, a task generated by the mobile device u in each time period is defined as
Figure BDA0003131191270000014
Further, the designing of the multi-edge server cooperation model in step 2 is specifically as follows:
in a mobile edge computing system, the invention designs a two-layer cooperative computing unloading framework, wherein the first layer is the unloading from mobile equipment to an edge server, and the second layer is the cooperative unloading between the edge servers;
in the unloading process from the first layer of mobile equipment to the edge server, each mobile equipment is set to completely unload the calculation task to the edge server of the micro base station end which can communicate with the mobile equipment as a whole; setting up
Figure BDA0003131191270000021
Indicating that the mobile device u is associated with the micro base station i, otherwise
Figure BDA0003131191270000022
Figure BDA0003131191270000023
The present invention therefore defines a mobile device association policy as
Figure BDA0003131191270000024
Since each mobile device communicates with at most one micro base station per time period, the association policy satisfies the constraint
Figure BDA0003131191270000025
In the cooperative unloading between the second layer of edge servers, the invention sets that each task can be arbitrarily divided into M parts and respectively transmitted to the edge servers of the corresponding M micro base station ends through the microwave communication between the base stations; setting up
Figure BDA0003131191270000026
A data size representing a data size at which a task generated by the mobile device u is transmitted to the target server i within a time period, the transmission link being a shortest communication path from an initial base station associated with the mobile device u to the target base station i; the invention sets each edge server to have a task queue, the first arrived task is stored in the task queue, then the first coming first service mechanism is adopted to dispatch and execute; the invention sets the multi-task unloading strategy as
Figure BDA0003131191270000027
Which satisfy the constraints
Figure BDA0003131191270000028
To ensure that the tasks generated by the mobile device u are fully executed;
under the multi-edge server cooperation model, the method solves the main problem of realizing the optimal user experience quality within the bearable cost of a network operator; the invention sets the user experience quality as the average completion time delay of all tasks in each time period, and sets the bearing cost of a network operator as the energy consumption of all edge servers and the overall state of data carrying among the second layer micro base stations; the data carrying state between the second layer micro base stations is defined as the total size of data transmitted in each time period, which is related to the divided task size and the shortest communication path, and the invention defines the data carrying state as follows:
Figure BDA0003131191270000029
where Hops (u, i) represents the number of Hops in the shortest communication path from the initial base station associated with mobile device u to the target base station i;
in order to solve the main problems, the present invention also needs to quantify some elements in the mobile edge computing system, including task upload communication delay from the mobile device to the initial associated micro base station, task transmission delay between the micro base stations, task waiting delay on the edge server, task computing delay on the edge server, computing energy consumption of the edge server, and transmission energy consumption between the micro base stations.
Further, the communication model designed in step 3 is specifically as follows:
in the design of a communication model, the invention defines the task uploading communication time delay from the mobile equipment to the initially associated micro base station and the task transmission time delay between the micro base stations;
the invention uses the orthogonal frequency division multiple access technology as the communication basis between the mobile equipment and the base station, the mobile equipment which is communicated with the same base station is set to be allocated with an orthogonal frequency spectrum, the micro base stations are set to transmit data through microwaves, and signal interference factors of communication between the second layer of micro base stations are ignored, so that the invention only considers the inter-cell interference when the first layer of mobile equipment is communicated with the micro base stations;
in a communication model of a mobile edge computing system, the present invention sets up a channel setAs C = {1,2, \8230;, C }, and the bandwidth of each channel is w; setting a channel allocation policy of
Figure BDA00031311912700000210
Indicating whether channel k is allocated for use by mobile device u; in channel allocation, each mobile device is set to occupy only one channel, so policy B satisfies the constraint
Figure BDA00031311912700000211
Simultaneous micro base station
Figure BDA00031311912700000212
Figure BDA00031311912700000213
The number of allocated channels cannot exceed the upper limit of the number of channels owned by itself, so policy B also satisfies the constraint
Figure BDA00031311912700000214
When the mobile device u occupies the channel k and is associated with the micro base station i for communication, the received inter-cell signal interference is defined as:
Figure BDA00031311912700000215
in the formula P u′ Represents the transmission power of the mobile device u',
Figure BDA00031311912700000216
represents the channel gain between the mobile device u' on channel k and the micro base station i; therefore, the communication upload rate from the mobile device u to the micro base station i is as follows:
Figure BDA00031311912700000217
in the formula N 0 Is a high silk white channel noise variance; in offloading of the first tier mobile device to the micro base station edge server,the uploading communication time delay of the mobile device u for unloading the own calculation task to the initial associated micro base station i is defined as:
Figure BDA00031311912700000218
in the cooperative unloading between the second layer of micro base stations, the microwave transmission rates between the micro base stations are set to be equal, and the waiting time delay between the micro base stations in the data transmission process is ignored, so that the communication transmission time delay between the micro base stations is only equal to the hop count of the micro base stations in the task scale and the shortest path; the invention sets the data transmission rate between the micro base stations to alpha, so that the mobile device u is partially tasked
Figure BDA0003131191270000031
The transmission delay from the associated initial base station to the target base station i is:
Figure BDA0003131191270000032
further, the design calculation model in step 4 is specifically as follows:
in the design of a calculation model, the invention defines the calculation time delay and the waiting time delay of a task on an edge server;
part of tasks of an edge server i reaching a target base station end
Figure BDA0003131191270000033
The required computation delay when executed is defined as:
Figure BDA0003131191270000034
the invention sets the state of the task queue of the edge server i at the beginning of the time period l as
Figure BDA0003131191270000035
Mainly referred to as tasksThe number of CPU cycles needed when the task waiting for processing on the queue is calculated; in a moving edge computing system, part of the task
Figure BDA0003131191270000036
Total time delay required to reach target server i
Figure BDA0003131191270000037
The method comprises two parts, namely uploading time delay from the mobile equipment to an initial associated base station
Figure BDA0003131191270000038
With delay of transmission of part of task from initial base station to target base station
Figure BDA0003131191270000039
The detailed definition is as follows:
Figure BDA00031311912700000310
based on total time delay
Figure BDA00031311912700000311
The invention sets a function Sort (u, i, l) to solve a partial task of a mobile device u in a time period l
Figure BDA00031311912700000312
Set of mobile devices arriving earlier at target server i, thus part of the task
Figure BDA00031311912700000313
The latency at the target server i may be defined as:
Figure BDA00031311912700000314
in the formula
Figure BDA00031311912700000315
Indicating that a portion of the task is compared over a period of time l
Figure BDA00031311912700000316
The data size of the target server i is reached earlier;
based on the calculation model, the state of the task queue of the edge server i at the beginning of the time period l +1 is:
Figure BDA00031311912700000317
in the formula
Figure BDA00031311912700000318
Indicating the size of the data received by edge server i during time period l,
Figure BDA00031311912700000323
for the length of each time segment.
Further, the design energy consumption model in step 5 is specifically as follows:
in the design of the energy consumption model, the invention defines the calculation energy consumption of the edge server and the transmission energy consumption between the micro base stations;
the invention sets the energy consumption of the edge server i for processing one CPU cycle as e i Then, the energy consumption generated when the edge server i finishes processing the task received in the time period l is defined as:
Figure BDA00031311912700000319
the invention defines the communication transmission energy consumption of a second layer micro base station i as follows:
Figure BDA00031311912700000320
in the formula, inPath (i, u, j) =0/1, and a value of 1 indicates that the micro base station i is in a partial task
Figure BDA00031311912700000321
In the communication link to the target micro base station j, otherwise the value is 0, delta i And transmitting power when the micro base station i performs communication between the second layer base stations.
Further, the joint optimization problem model and objective function for resource allocation and multi-task collaborative offloading provided in step 6 are specifically as follows:
the invention comprehensively considers the mobile equipment association strategy, the channel allocation strategy and the multi-task cooperation unloading strategy to form a multi-objective constraint optimization problem, and the problem aims at pursuing the optimal user experience quality within the bearable cost of a network operator;
due to the concurrency property of multi-element division, the completion delay of the task depends on the maximum value of the completion delay of a plurality of partial tasks, and the completion delay of the task generated by the mobile equipment u in the time period l is defined as:
Figure BDA00031311912700000322
Figure BDA0003131191270000041
based on the energy consumption model, the energy consumption of the micro base station i in the time period l is defined as follows:
Figure BDA0003131191270000042
the present invention aims at pursuing the best quality of user experience within the affordable cost of the network operator, so the optimization problem can be defined as:
Figure BDA0003131191270000043
Figure BDA0003131191270000044
Figure BDA0003131191270000045
Figure BDA0003131191270000046
Figure BDA0003131191270000047
Figure BDA0003131191270000048
Figure BDA0003131191270000049
ζ l ≤ζ max C7
the constraint C1 represents the data type of multiple targets, the constraint C2 represents that each mobile device is in associated communication with at most one micro base station in the same time period, the constraint C3 represents that each mobile device uses at most one channel in the same time period, the constraint C4 represents that the number of channels allocated by the micro base stations cannot exceed the total number of channels owned by the micro base stations, the constraint C5 represents that tasks can be randomly diversified, the constraint C6 represents that the energy consumption of the micro base stations cannot exceed the limit value of the micro base stations in each time period, the constraint C7 represents that the total transmission state between the second-layer micro base stations in each time period cannot exceed the allowed limit value, and the constraint C6 and the constraint C7 represent the affordable cost of a network operator.
Further, the method for joint resource allocation and multi-task cooperative offloading designed in step 7 is designed by combining a genetic algorithm and a deep reinforcement learning algorithm with respect to the objective function, so as to pursue the optimal service quality of the user within the bearable cost range of the network operator, and solve the server overload problem and the long-term system performance problem, specifically as follows:
the invention designs a calculation unloading scheme based on a genetic algorithm and a depth certainty strategy gradient, and the scheme can solve the optimization problem P1 of joint resource allocation and multi-task cooperative unloading;
according to the proposed composition (12) of the two-layer collaborative computation unloading framework and the task completion time delay, the optimization problem P1 is decomposed into two sub-problems by utilizing the decomposition idea in the original-dual method;
in the first tier mobile device to edge server offload, the first sub-problem P2 that is broken down is defined as:
Figure BDA00031311912700000410
for the subproblem P2, the invention designs a genetic algorithm to obtain a mobile equipment association strategy X and a channel allocation strategy B from the first layer of mobile equipment to the edge server for unloading;
in the cooperative offloading between the second-tier micro base stations, the second sub-problem P3 that is decomposed into is defined as:
Figure BDA00031311912700000411
for the sub-problem P3, the determination of the multi-task cooperative unloading strategy B in the time period l depends on the state of the edge server in the time period l-1, so the sub-problem P3 is described by a Markov decision process, and the gradient of a deep deterministic strategy in a deep reinforcement learning technology is utilized to solve; since the subproblem P3 has markov characteristics, the long-term performance of the system can be achieved by the accumulated reward function in reinforcement learning;
the invention introduces the genetic algorithm solving process of the sub-problem P2, the deep reinforcement learning technology solving process of the sub-problem P3 and the calculation unloading scheme based on the genetic algorithm and the deep certainty strategy gradient aiming at the total problem P1 in detail;
step 7.1, designing a mobile equipment association and channel allocation algorithm based on heredity, wherein the heredity algorithm is a random search algorithm for simulating biological evolution to solve complex problems, and the idea of survival of a suitable person is adopted as the evolution principle; genetic algorithms only require that the problem to be solved be computable and do not take into account other mathematical properties, such as differentiable and continuous. Genetic algorithms start with a set of initial solutions and optimize them through some genetic operations (selection, crossover and mutation) until an acceptable solution or convergence is reached; particularly, the crossover and mutation operations of the genetic algorithm can keep the population diversity and expand the search area, so that the search area is not easy to fall into a local optimal point; therefore, genetic algorithms are powerful in searching global regions; the present invention designs genetic operations to solve the sub-problem P2:
step 7.1.1, designing chromosome and fitness function:
in order to define the optimization goal of the subproblem P2, the invention sets the chromosome I of the organism individual as:
I=[X,B] T (14)
in the formula
Figure BDA0003131191270000051
A policy is associated with the mobile device in order to,
Figure BDA0003131191270000052
for the channel allocation strategy, the invention sets variables in order to satisfy constraints C1 and C2 in the subproblem P2
Figure BDA0003131191270000053
A micro base station with which mobile device u is associated to communicate,
Figure BDA0003131191270000054
is a set of micro base stations that can communicate with mobile device u during time period l; to satisfy constraints C1, C2, and C3, variables are set
Figure BDA0003131191270000055
Indicating that mobile device u and micro base station are moving within time period l
Figure BDA0003131191270000056
Associating channels used in communication;
in order to evaluate the quality of biological individuals in the population, the fitness function is set by combining the sub-problem P2 as follows:
Figure BDA0003131191270000057
step 7.1.2, designing population initialization and selection operators:
the invention sets the biological population initialization operation as follows:
Figure BDA0003131191270000058
Figure BDA0003131191270000059
in the formula, randin (Set) is a generating function and represents that a random element is output from the Set;
for the selection operator, the method selects K individuals from the population to form a parent population by using a championship selection method which is more suitable for minimizing the problem; in order to improve the performance of a genetic algorithm, the optimal individuals in a population are recorded in the genetic process, and if the optimal individuals are not selected in the selection process, the optimal individuals replace the worst individuals in the population;
step 7.1.3, designing crossover and mutation operators:
crossover and mutation operations are effective ways to increase the diversity of offspring, resulting in better problem solutions; for crossover operations, two individuals are randomly selected from the parent population to cross with a probability p c Two offspring individuals are generated through crossing, and the gene exchange is carried out by using a two-point crossing method; for mutation manipulation, the present invention sets the mutation probability of each organism to p m Two mutations randomly generated per chromosome pairMutating the gene in the variable point;
the mutation principle of the mobile equipment association strategy X in the chromosome is set as follows:
Figure BDA00031311912700000510
the mutation principle of the channel allocation strategy B in the chromosome is set as follows:
Figure BDA00031311912700000511
in the formula
Figure BDA00031311912700000512
The maximum element value in policy B is assigned to the channel,
Figure BDA00031311912700000513
for the minimum element value in channel assignment strategy B, round (value) is defined as a function that outputs an integer no greater than the value, ψ 1 ,ψ 2 Two random numbers following a normal distribution U (0, 1);
step 7.2, designing a multi-task cooperative unloading algorithm based on a depth certainty strategy gradient, dispersing time into a plurality of time periods, and supposing that a batch of task requests can reach a corresponding edge server in each time period according to a mobile equipment association strategy and a channel allocation strategy; because each server is modeled as a queue system, the current queue state of the server can influence the time cost for completing the arrival task, the multi-task cooperative unloading strategy in the time period l depends on the current communication environment and the queue state of the server in the time period l-1, so that the sub-problem P3 can be expressed as a Markov decision process and solved by using a deep deterministic strategy gradient method, and the long-term performance of the system can be considered through the process;
step 7.2.1, markov decision process:
the invention relates to the time period lThe system state is defined as
Figure BDA00031311912700000514
The action taken during the time period l is defined as
Figure BDA00031311912700000515
Figure BDA00031311912700000516
Action a l Equivalent to the multi-task cooperative unloading strategy D in the subproblem P3;
the sub-problem P3 aims to minimize the average task completion delay in the second layer micro base station cooperation, so the present invention puts the system in state S l Take action a l The instant prize to be acquired is defined as:
Figure BDA0003131191270000061
Figure BDA0003131191270000062
in the formula
Figure BDA0003131191270000063
Is a penalty term for constraint C6, α i Is a penalty coefficient belonging to edge server i; beta max (0, zeta) lmax ) Is a penalty term for constraint C7, β is a penalty coefficient; the epsilon is an equilibrium value of different attribute units and depends on the maximum difference value of L in the simulation experiment;
after obtaining the instant prize, the system status will be from S l Conversion to S l+1 In order to analyze the influence of the action on the system state, the invention sets the calculated amount of the edge server i in the time period l
Figure BDA0003131191270000064
Comprises the following steps:
Figure BDA0003131191270000065
in the formula
Figure BDA0003131191270000066
Indicating the size of the data received by edge server i during time period l,
Figure BDA00031311912700000614
for the length of each time segment; therefore, the system state transition from period l to period l +1 is defined as:
Figure BDA0003131191270000067
to take into account the long-term performance of the system, the invention continues l max The multi-task cooperative unloading strategy mu under each time period: s l →a l The long-term cumulative reward of (1) is:
Figure BDA0003131191270000068
wherein gamma belongs to [0,1] as a discount coefficient; in the Markov decision process, the action space and the state space both relate to continuous values, so the method adopts a depth certainty strategy gradient method in the deep reinforcement learning to solve;
7.2.2, a multi-task unloading algorithm based on the depth certainty strategy gradient:
the structure of the depth deterministic strategy gradient is realized based on a participant (Actor) -evaluator (Critic) framework, wherein the Actor is responsible for generating actions and interacting with the environment, and the Critic is responsible for evaluating the performance of the Actor and guiding the Actor to generate better actions; the gradient algorithm of the depth deterministic strategy consists of five parts, namely a main Actor network mu (S) respectively l (ii) a θ), primary Critic network Q (S) l ,a l (ii) a Omega), target Actor network mu' (S) l (ii) a Theta '), target critic network Q' (S) l ,a l (ii) a ω') and experience playback poolsR; the experience replay pool is responsible for storing the system state transition experience, which is defined as (S), and consists of state transitions and actions per time period and instant rewards l ,a l ,r l ,S l+1 ) The experience playback technology randomly samples from the pool to train in the learning process, so that the association between experiences is broken, and the learning performance is improved;
(1) Designing the Main Actor network μ (S) l ;θ):
Deterministic multi-task cooperative offloading strategy μ at successive time periods: s. the l →a l Can be approximated by a parameter theta as a continuous function a l =μ(S l (ii) a θ); the Actor network iteratively updates network parameters, selects a current action according to a current state, and interacts with the mobile edge computing environment to generate a next state and an instant reward;
randomly selecting psi experiences from a pool of empirical playback as a sample set psi = { (S) i ,a i ,U i ,S i+1 ) (ii) a i belongs to {0,1,. Phi.,. Psi } }, and sets a network updating network parameter theta of a main Actor network μ The policy gradient formula of (c) is:
Figure BDA0003131191270000069
in order to satisfy constraint C5, the output value of the main Actor network is normalized by the invention, and MN output network values are defined as
Figure BDA00031311912700000610
Normalized post-action a l Value of
Figure BDA00031311912700000611
Is represented as:
Figure BDA00031311912700000612
(2) Design of the Primary Critic network Q (S) l ,a l ;ω):
The primary Critic network uses an approximate action value function Q (S) l ,a l (ii) a ω) to evaluate the merits of the selected action and direct the primary Actor network, the action value function based on the bellman equation is expressed as:
Figure BDA00031311912700000613
randomly selecting psi experiences from a pool of empirical playback as a sample set psi = { (S) i ,a i ,U i ,S i+1 ) (ii) a i ∈ {0, 1.,. ψ } }, the primary Critic network updates the network parameters ω by minimizing the loss function L Q Expressed as:
Figure BDA0003131191270000071
in the formula y i =r i (S i ,a i )+γQ′(S i+1 ,μ′(S i+1 ;θ′);ω′),y i The calculation of (2) requires the participation of a target Actor network and a target critic network;
(3) Design target Actor network mu' (S) l ;θ′):
Target Actor network mu' (S) l (ii) a θ') is responsible for the next state S based on the samples from the experience pool i+1 Choose the best next action a i+1 =μ′(S i+1 (ii) a Theta ') of network parameter theta' μ′ Is according to the parameter theta in the main Actor network μ Soft update is performed, represented as:
θ′ μ′ =τθ μ +(1-τ)θ′ μ′ (29)
wherein tau belongs to [0,1] as a soft update coefficient;
(4) Design target criticic network Q' (S) l ,a l ;ω′):
Target criticic network Q' (S) l ,a l (ii) a ω') is mainly embodied in the calculation of the loss function L, the network parametersω′ q′ Is based on the parameter ω in the primary Critic network Q Soft update is performed, represented as:
ω′ Q′ =τω Q +(1-τ)ω′ Q′ (30)
step 7.1, design of a genetic and depth-deterministic policy gradient-based computational offloading scheme
In a mobile edge computing system with a plurality of mobile devices and a plurality of edge servers, the invention sets that each mobile device generates a task request in each time period, and each micro base station end server can be used as a central controller to solve a computation unloading strategy;
in order to realize the best average user experience within the bearable cost of a network operator, the invention designs a calculation unloading scheme based on a genetic and deep certainty strategy gradient so as to obtain a mobile equipment association strategy, a channel allocation strategy and a multi-task cooperation unloading strategy; the invention randomly selects a micro base station edge server as a central controller to execute a calculation unloading scheme;
compared with the prior art, the invention has the remarkable advantages that: (1) The cooperation of a plurality of edge servers of the micro base station can effectively avoid the problem of server overload; (2) The long-term performance of the system can be effectively ensured by the calculation unloading scheme based on the deep reinforcement learning technology; (3) The method aims at pursuing the optimal experience quality of the user within the bearable cost of a network operator, solves the problem of joint resource optimization in the mobile edge calculation under the scene of multiple mobile devices and multiple edge servers, namely the problems of mobile device association, channel allocation and multi-task cooperative unloading, and provides a technical basis for the effective operation of the mobile edge calculation project under the 5G communication environment.
Drawings
FIG. 1 is a block diagram of a mobile edge computing system in a multi-mobile device and multi-micro base station edge service scenario.
FIG. 2 is a diagram of a two-tier computing offload framework of the present invention.
FIG. 3 is a chromosome structural diagram in the present invention.
FIG. 4 is a schematic diagram of chromosome crossing operation in the present invention.
FIG. 5 is a schematic diagram of the operation of chromosomal mutation in the present invention.
FIG. 6 is a diagram of a deep deterministic policy gradient network architecture in accordance with the present invention.
FIG. 7 is a diagram illustrating normalization of network output values according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention relates to a joint resource allocation and multi-element task unloading method based on server cooperation, which comprises the following steps:
step 1, establishing a mobile edge computing system model under the scene of multiple mobile devices and multiple edge servers, which comprises the following steps:
combining the mobile edge computing system model in the multi-mobile device and multi-micro base station edge service scenario in fig. 1, it is configured that the mobile edge computing system has M densely distributed micro base stations
Figure BDA0003131191270000072
The micro base stations can communicate with each other through a 5G wireless microwave communication link, and each base station is provided with an edge server for providing computing service for the mobile equipment; provisioning edge servers
Figure BDA0003131191270000073
Has a calculation performance (clock frequency) of f i (ii) a Set the mobile devices under the mobile edge computing system as
Figure BDA0003131191270000074
Each mobile device can wirelessly communicate with the micro base station closest to the mobile device;
setting the mobile equipment to generate G-type calculation intensive and ultra-low delay calculation tasks, wherein the expression form of the task t belonging to G is k t ={d t ,c t ,β t In which d is t Data size (bits), c, defined as a computational task t Is defined as to executeCPU cycle, β, required for each bit of task data t (∑ t∈G β t = 1) is defined as the probability that the mobile device generates the task t; for better reality, the set time is discretized into a plurality of continuous time periods, and each mobile device generates a calculation task in each time period, for example, a task generated by the mobile device u in each time period is defined as
Figure BDA0003131191270000075
Step 2, designing a multi-edge server cooperation model, which comprises the following specific steps:
with reference to fig. 2, in the mobile edge computing system, the present invention designs a two-layer cooperative computing offload framework, where the first layer is offload from the mobile device to the edge server, and the second layer is cooperative offload between the edge servers;
in the unloading process from the first layer of mobile equipment to the edge server, each mobile equipment is set to completely unload the calculation task to the edge server of the micro base station end which can communicate with the mobile equipment as a whole; setting up
Figure BDA0003131191270000081
Indicating that the mobile device u is associated with the micro base station i, otherwise
Figure BDA0003131191270000082
0, the present invention therefore defines a mobile device association policy as
Figure BDA0003131191270000083
Since each mobile device communicates with at most one micro base station per time period, the association policy satisfies the constraint
Figure BDA0003131191270000084
In the cooperative unloading between the second layer edge servers, the invention sets that each task can be arbitrarily divided into M parts and respectively transmitted to the edge servers of the corresponding M micro base station ends through the microwave communication between the base stationsThe above step (1); setting up
Figure BDA0003131191270000085
A data size indicating that a task generated by the mobile device u is transmitted to the target server i within a time period, the transmission link being the shortest communication path from the initial base station associated with the mobile device u to the target base station i; the invention sets each edge server to have a task queue, the first arrived task is stored in the task queue, then the first coming first service mechanism is adopted to dispatch and execute; the invention sets the multi-task unloading strategy as
Figure BDA0003131191270000086
Which satisfy the constraints
Figure BDA0003131191270000087
To ensure that the tasks generated by the mobile device u are fully executed;
under the multi-edge server cooperation model, the method solves the main problem of realizing the optimal user experience quality within the bearable cost of a network operator; the method sets the user experience quality as the average completion time delay of all tasks in each time period, and sets the bearing cost of a network operator as the energy consumption of all edge servers and the overall state of data carrying among second-layer micro base stations; the data carrying state between the second layer micro base stations is defined as the total size of data transmitted in each time period, which is related to the divided task size and the shortest communication path, and the invention defines the data carrying state as follows:
Figure BDA0003131191270000088
where hoss (u, i) represents the number of Hops in the shortest communication path from the initial base station associated with mobile device u to target base station i;
in order to solve the main problems, the present invention also needs to quantify some elements in the mobile edge computing system, including task upload communication delay from the mobile device to the initial associated micro base station, task transmission delay between the micro base stations, task waiting delay on the edge server, task computing delay on the edge server, computing energy consumption of the edge server, and transmission energy consumption between the micro base stations.
Step 3, designing a communication model, specifically as follows:
in the design of a communication model, the invention defines the task uploading communication time delay from the mobile equipment to the initially associated micro base station and the task transmission time delay between the micro base stations;
the invention uses the orthogonal frequency division multiple access technology as the communication basis between the mobile equipment and the base station, the mobile equipment which is communicated with the same base station is set to be allocated with an orthogonal frequency spectrum, the micro base stations are set to transmit data through microwaves, and signal interference factors of communication between the second layer of micro base stations are ignored, so that the invention only considers the inter-cell interference when the first layer of mobile equipment is communicated with the micro base stations;
in a communication model of a mobile edge computing system, the invention sets a channel set as C = {1,2, \8230;, C }, and the bandwidth of each channel is w; setting a channel allocation policy of
Figure BDA0003131191270000089
Indicating whether channel k is allocated for use by mobile device u; in channel allocation, each mobile device is set to occupy only one channel, so policy B satisfies the constraint
Figure BDA00031311912700000810
Simultaneous micro base station
Figure BDA00031311912700000811
Figure BDA00031311912700000812
The number of channels allocated cannot exceed the upper limit of the number of channels owned by the policy B, and therefore the policy B also satisfies the constraint
Figure BDA00031311912700000813
When the mobile device u occupies the channel k to communicate with the micro base station i, the received intercell signal interference is defined as:
Figure BDA00031311912700000814
in the formula P u′ Represents the transmission power of the mobile device u',
Figure BDA00031311912700000815
represents the channel gain between the mobile device u' on channel k and the micro base station i; therefore, the communication upload rate from the mobile device u to the micro base station i is as follows:
Figure BDA00031311912700000816
in the formula N 0 Is a high silk white channel noise variance; in the unloading from the first-layer mobile device to the edge server of the micro base station, the uploading communication delay of the mobile device u unloading its own computing task to the initial associated micro base station i is defined as:
Figure BDA00031311912700000817
in the cooperative unloading between the second layer of micro base stations, the microwave transmission rates between the micro base stations are set to be equal, and the waiting time delay between the micro base stations in the data transmission process is ignored, so the communication transmission time delay between the micro base stations is only related to the task scale and the hop count of the micro base stations in the shortest path; the invention sets the data transmission rate between the micro base stations to be alpha, so that the partial task of the mobile device u
Figure BDA0003131191270000091
The transmission delay from the associated initial base station to the target base station i is:
Figure BDA0003131191270000092
step 4, designing a calculation model, specifically as follows:
in the design of a calculation model, the invention defines the calculation delay and the waiting delay of a task on an edge server;
part of tasks of an edge server i reaching a target base station end
Figure BDA0003131191270000093
The required computation delay when executed is defined as:
Figure BDA0003131191270000094
the invention sets the state of the task queue of the edge server i at the beginning of the time period l as
Figure BDA0003131191270000095
The method mainly comprises the steps of (1) calculating the number of CPU cycles needed when tasks waiting for processing on a task queue are calculated; in a moving edge computing system, part of the task
Figure BDA0003131191270000096
Total time delay required to reach target server i
Figure BDA0003131191270000097
The method comprises two parts, namely uploading time delay from the mobile equipment to an initial associated base station
Figure BDA0003131191270000098
With delay of transmission of part of task from initial base station to target base station
Figure BDA0003131191270000099
The detailed definition is as follows:
Figure BDA00031311912700000910
based on total time delay
Figure BDA00031311912700000911
The invention sets a function Sort (u, i, l) to solve a partial task of the mobile device u within a time period l
Figure BDA00031311912700000912
Set of mobile devices arriving earlier at target server i, thus part of the task
Figure BDA00031311912700000913
The latency at the target server i may be defined as:
Figure BDA00031311912700000914
in the formula
Figure BDA00031311912700000915
Indicating that a portion of the task is compared over a period of time l
Figure BDA00031311912700000916
The data size of the target server i is reached earlier;
based on the calculation model, the state of the task queue of the edge server i at the beginning of the time period l +1 is:
Figure BDA00031311912700000917
in the formula
Figure BDA00031311912700000918
Indicating the size of the data received by edge server i during time period l,
Figure BDA00031311912700000923
for the length of each time segment.
Step 5, designing an energy consumption model, which comprises the following specific steps:
in the design of the energy consumption model, the invention defines the calculation energy consumption of the edge server and the transmission energy consumption between the micro base stations;
the invention sets the energy consumption of the edge server i for processing one CPU cycle as e i Then, the energy consumption generated when the edge server i finishes processing the task received in the time period l is defined as:
Figure BDA00031311912700000919
the invention defines the communication transmission energy consumption of a second layer micro base station i as follows:
Figure BDA00031311912700000920
in the formula, inPath (i, u, j) =0/1, and a value of 1 indicates that the micro base station i is in a partial task
Figure BDA00031311912700000921
In the last communication link to the target micro base station j, otherwise the value is 0, delta i Transmission power when performing communication between the second layer base stations for the micro base station i.
Step 6, providing a joint optimization problem model of resource allocation and multi-task cooperative unloading and an objective function, wherein the joint optimization problem model comprises the following specific steps:
the invention comprehensively considers the mobile equipment association strategy, the channel allocation strategy and the multi-task cooperation unloading strategy to form a multi-objective constraint optimization problem, and the problem aims at pursuing the optimal user experience quality within the bearable cost of a network operator;
due to the concurrency property of multi-element division, the completion delay of the task depends on the maximum value of the completion delay of a plurality of partial tasks, and the completion delay of the task generated by the mobile equipment u in the time period l is defined as:
Figure BDA00031311912700000922
based on the energy consumption model, the energy consumption of the micro base station i in the time period l is defined as:
Figure BDA0003131191270000101
the present invention aims at pursuing the best quality of user experience within the affordable cost of the network operator, so the optimization problem can be defined as:
Figure BDA0003131191270000102
Figure BDA0003131191270000103
Figure BDA0003131191270000104
Figure BDA0003131191270000105
Figure BDA0003131191270000106
Figure BDA0003131191270000107
Figure BDA0003131191270000108
ζ l ≤ζ max C7
the constraint C1 represents the data type of multiple targets, the constraint C2 represents that each mobile device performs associated communication with at most one micro base station in the same time period, the constraint C3 represents that each mobile device uses at most one channel in the same time period, the constraint C4 represents that the number of channels allocated by the micro base station cannot exceed the total number of channels owned by the micro base station, the constraint C5 represents that tasks can be randomly diversified, the constraint C6 represents that the energy consumption of the micro base station cannot exceed the limit value of the micro base station in each time period, the constraint C7 represents that the total transmission state between second-layer micro base stations in each time period cannot exceed the allowed limit value, and the constraint C6 and the constraint C7 represent the affordable cost of a network operator.
Step 7, aiming at the objective function, a combined resource allocation and multi-task collaborative unloading method is designed by combining a genetic algorithm and a deep reinforcement learning algorithm so as to pursue the optimal service quality of a user within the bearable cost range of a network operator and solve the problems of server overload and long-term system performance, and the method is specifically as follows:
the invention designs a calculation unloading scheme based on a genetic algorithm and a depth certainty strategy gradient, and the scheme can solve the optimization problem P1 of joint resource allocation and multi-task cooperative unloading;
according to the proposed composition (12) of the two-layer collaborative computation unloading framework and the task completion time delay, the optimization problem P1 is decomposed into two sub-problems by utilizing the decomposition idea in the original-dual method;
in the first tier mobile device to edge server offload, the first sub-problem P2 that is broken down is defined as:
Figure BDA0003131191270000109
for the subproblem P2, the invention designs a genetic algorithm to obtain a mobile equipment association strategy X and a channel allocation strategy B from the first layer of mobile equipment to the edge server for unloading;
in the cooperative offloading between the second-tier micro base stations, the second sub-problem P3 that is decomposed into is defined as:
Figure BDA00031311912700001010
for the sub-problem P3, the determination of the multi-task cooperative unloading strategy B in the time period l depends on the state of the edge server in the time period l-1, so the sub-problem P3 is described by a Markov decision process, and the gradient of a deep deterministic strategy in a deep reinforcement learning technology is utilized to solve; since the subproblem P3 has markov characteristics, the long-term performance of the system can be achieved by the cumulative reward function in reinforcement learning;
the invention introduces the genetic algorithm solving process of the subproblem P2, the deep reinforcement learning technology solving process of the subproblem P3 and the calculation unloading scheme based on the genetic algorithm and the depth certainty strategy gradient aiming at the total problem P1 in detail;
step 7.1, designing a mobile equipment association and channel allocation algorithm based on heredity, wherein the heredity algorithm is a random search algorithm for simulating biological evolution to solve complex problems, and the idea of survival of a suitable person is adopted as the evolution principle; genetic algorithms only require that the problem to be solved be computable and do not take into account other mathematical properties, such as differentiable and continuous. Genetic algorithms start with a set of initial solutions and optimize them through some genetic operations (selection, crossover and mutation) until an acceptable solution or convergence is reached; particularly, the crossover and mutation operations of the genetic algorithm can keep the population diversity and expand the search area, so that the search area is not easy to fall into a local optimal point; therefore, genetic algorithms are powerful in searching for global regions; the present invention designs genetic operations to solve the sub-problem P2:
step 7.1.1, designing chromosome and fitness function:
referring to fig. 3, in order to clarify the optimization goal of the subproblem P2, the present invention sets the chromosome I of the organism individual as:
I=[X,B] T (14)
in the formula
Figure BDA0003131191270000111
A policy is associated with the mobile device in order to,
Figure BDA0003131191270000112
channel allocation strategyTo satisfy constraints C1 and C2 in sub-problem P2, the present invention sets variables
Figure BDA0003131191270000113
A micro base station with which mobile device u is associated to communicate,
Figure BDA0003131191270000114
is a set of micro base stations that can communicate with mobile device u during time period l; to satisfy constraints C1, C2, and C3, variables are set
Figure BDA0003131191270000115
Indicating that mobile device u and micro base station are moving within time period l
Figure BDA0003131191270000116
Associating channels used in communication;
in order to evaluate the quality of biological individuals in the population, the fitness function is set by combining a subproblem P2:
Figure BDA0003131191270000117
step 7.1.2, designing population initialization and selecting operators:
the invention sets the biological population initialization operation as follows:
Figure BDA0003131191270000118
Figure BDA0003131191270000119
in the formula, randin (Set) is a generating function and represents that a random element is output from the Set;
for the selection operator, the method selects K individuals from the population to form a parent population by using a selection method which is more suitable for minimizing problem tournaments; in order to improve the performance of a genetic algorithm, the optimal individuals in a population are recorded in the genetic process, and if the optimal individuals are not selected in the selection process, the optimal individuals replace the worst individuals in the population;
step 7.1.3, designing crossover and mutation operators:
crossover and mutation operations are effective ways to increase the diversity of offspring, resulting in better problem solutions; for crossover operations, in conjunction with the chromosome crossover operation in FIG. 4, two individuals were randomly selected from the parent population to have a crossover probability p c Two offspring individuals are generated through crossing, and the gene exchange is carried out by using a two-point crossing method; for mutation operation, in combination with the chromosomal mutation operation in FIG. 5, the present invention sets the mutation probability of each organism individual to p m Each chromosome mutates genes in two randomly generated mutation points;
the mutation principle of the mobile equipment association strategy X in the chromosome is set as follows:
Figure BDA00031311912700001110
the mutation principle of the channel allocation strategy B in the chromosome is set as follows:
Figure BDA00031311912700001111
in the formula
Figure BDA00031311912700001112
The maximum element value in policy B is assigned to the channel,
Figure BDA00031311912700001113
for the minimum element value in channel assignment policy B, round (value) is defined as a function that outputs an integer no greater than the value, ψ 1 ,ψ 2 Two random numbers following a normal distribution U (0, 1);
the present invention defines a genetic-based mobile device association and channel allocation algorithm pseudocode as follows:
Figure BDA00031311912700001114
Figure BDA0003131191270000121
step 7.2, designing a multi-task cooperative unloading algorithm based on a depth certainty strategy gradient, dispersing time into a plurality of time periods, and supposing that a batch of task requests can reach a corresponding edge server in each time period according to a mobile equipment association strategy and a channel allocation strategy; because each server is modeled as a queue system, the current queue state of the server can influence the time cost for completing the arrival task, the multi-task cooperative unloading strategy in the time period l depends on the current communication environment and the queue state of the server in the time period l-1, so that the sub-problem P3 can be expressed as a Markov decision process and solved by using a deep deterministic strategy gradient method, and the long-term performance of the system can be considered through the process;
step 7.2.1, markov decision process:
the invention defines the system state of the time period l as
Figure BDA0003131191270000122
The action taken during the time period l is defined as
Figure BDA0003131191270000123
Figure BDA0003131191270000124
Action a l Equivalent to the multi-task cooperative unloading strategy D in the subproblem P3;
the sub-problem P3 aims to minimize the average task completion delay in the second layer micro base station cooperation, so the present invention puts the system in state S l Take action a l Timely acquired instant prize definitionComprises the following steps:
Figure BDA0003131191270000125
Figure BDA0003131191270000126
in the formula
Figure BDA0003131191270000127
Is a penalty term for constraint C6, α i Is a penalty factor belonging to the edge server i; beta max (0, zeta) lmax ) Is a penalty term for constraint C7, β is a penalty factor; e is the equilibrium value of different attribute units, which depends on the maximum difference value of L in the simulation experiment;
after obtaining the instant prize, the system state will be from S l Conversion to S l+1 In order to analyze the influence of the action on the system state, the invention sets the calculated amount of the edge server i in a time period l
Figure BDA0003131191270000128
Comprises the following steps:
Figure BDA0003131191270000129
in the formula
Figure BDA00031311912700001210
Indicating the size of the data received by edge server i during time period l,
Figure BDA00031311912700001213
for the length of each time segment; therefore, the system state transition from period l to period l +1 is defined as:
Figure BDA00031311912700001211
to take into account the long-term performance of the system, the invention continues l max Multi-task cooperative unloading strategy mu S in each time period l →a l The long-term cumulative reward of (1) is:
Figure BDA00031311912700001212
wherein gamma belongs to [0,1] as discount coefficient; in the Markov decision process, the action space and the state space both relate to continuous values, so the method adopts a depth certainty strategy gradient method in the deep reinforcement learning to solve;
7.2.2, a multi-task unloading algorithm based on the depth certainty strategy gradient:
the structure of the deep deterministic strategy gradient is realized based on a participant (Actor) -evaluator (Critic) framework, wherein the Actor is responsible for generating actions and combining environment interaction, and the Critic is responsible for evaluating the performance of the Actor and guiding the Actor to generate more optimal actions; with reference to fig. 6, the depth-deterministic policy gradient algorithm consists of five parts, respectively the main Actor network μ (S) l (ii) a θ), primary Critic network Q (S) l ,a l (ii) a Omega), target Actor network mu' (S) l (ii) a Theta '), a target critical network Q' (S) l ,a l (ii) a ω') and an empirical playback pool R; the experience replay pool is responsible for storing the system state transition experience, which is defined as (S), and consists of state transitions and actions per time period and instant rewards l ,a l ,r l ,S l+1 ) The experience playback technology randomly samples from the pool to train in the learning process, so that the association between experiences is broken, and the learning performance is improved;
(1) Designing the Main Actor network μ (S) l ;θ):
Deterministic multi-task cooperative offloading strategy μ at successive time periods: s. the l →a l Can be approximated by a parameter theta as a continuous function a l =μ(S l (ii) a θ); the Actor network iteratively updates network parameters, selects a current action according to a current state, and interacts with the mobile edge computing environmentGenerating a next state and an instant prize;
randomly selecting psi experiences from a pool of empirical playback as a sample set psi = { (S) i ,a i ,U i ,S i+1 ) (ii) a i belongs to {0, 1.,. Psi } }, and sets a main Actor network updating network parameter theta μ The policy gradient formula of (1) is:
Figure BDA0003131191270000131
in order to satisfy constraint C5, the present invention normalizes the output values of the primary Actor network, and in conjunction with fig. 7, mn output network values are defined as
Figure BDA0003131191270000132
Normalized post-action a l Value of
Figure BDA0003131191270000133
Is represented as:
Figure BDA0003131191270000134
(2) Design of the Primary Critic network Q (S) l ,a l ;ω):
The primary Critic network uses an approximate action value function Q (S) l ,a l (ii) a ω) to evaluate the merits of the selected action and direct the primary Actor network, the action value function based on the bellman equation is expressed as:
Figure BDA0003131191270000135
randomly selecting psi experiences from an experience playback pool as a sample set psi = { (S) i ,a i ,U i ,S i+1 ) (ii) a i ∈ {0, 1.,. ψ } }, the primary Critic network updates the network parameters ω by minimizing the loss function L Q Expressed as:
Figure BDA0003131191270000136
in the formula y i =r i (S i ,a i )+γQ′(S i+1 ,μ′(S i+1 ;θ′);ω′),y i The calculation of (2) requires the participation of a target Actor network and a target critic network;
(3) Design target Actor network mu' (S) l ;θ′):
Target Actor network mu' (S) l (ii) a θ') is responsible for the next state S based on the samples from the experience pool i+1 Choose the best next action a i+1 =μ′(S i+1 (ii) a θ '), network parameter θ' μ′ Is according to the parameter theta in the main Actor network μ Soft update is performed, represented as:
θ′ μ′ =τθ μ +(1-τ)θ′ μ′ (29)
wherein tau belongs to [0,1] as soft update coefficient;
(4) Design target criticic network Q' (S) l ,a l ;ω′):
Target criticic network Q' (S) l ,a l (ii) a ω ') is mainly represented in the calculation of the loss function L, the network parameter ω' Q′ Is based on the parameter omega in the primary Critic network Q Soft update is performed, represented as:
ω′ Q′ =τω Q +(1-τ)ω′ Q′ (30)
according to the deep reinforcement learning technology, the invention defines the pseudo code of the multi-task cooperative unloading algorithm based on the deep certainty strategy gradient as follows:
Figure BDA0003131191270000137
Figure BDA0003131191270000141
step 7.2.3, designing a calculation unloading scheme based on the inheritance and depth certainty strategy gradient,
in a mobile edge computing system with a plurality of mobile devices and a plurality of edge servers, the invention sets that each mobile device generates a task request in each time period, and each micro base station end server can be used as a central controller to solve a computation unloading strategy;
in order to realize the best average user experience within the bearable cost of a network operator, the invention designs a calculation unloading scheme based on a genetic and deep certainty strategy gradient so as to obtain a mobile equipment association strategy, a channel allocation strategy and a multi-task cooperation unloading strategy; the invention randomly selects a micro base station edge server as a central controller to execute a calculation unloading scheme, wherein the scheme is defined as:
Figure BDA0003131191270000142

Claims (8)

1. a joint resource allocation and multi-task unloading method based on server cooperation is characterized by comprising the following steps:
step 1, establishing a mobile edge computing system model under the scene of multiple mobile devices and multiple edge servers;
step 2, designing a multi-edge server cooperation model;
step 3, designing a communication model;
step 4, designing a calculation model;
step 5, designing an energy consumption model;
step 6, providing a joint optimization problem model of resource allocation and multi-task cooperative unloading and an objective function;
and 7, aiming at the objective function, designing a combined resource allocation and multi-task cooperative unloading method by combining a genetic algorithm and a deep reinforcement learning algorithm so as to pursue the optimal service quality of the user within the bearable cost range of a network operator and solve the problems of server overload and long-term system performance.
2. The method for joint resource allocation and multi-task offloading in a mobile edge computing system based on multi-edge server cooperation as claimed in claim 1, wherein the model of the mobile edge computing system in the multi-mobile device and multi-edge server scenario in step 1 is as follows:
setting a mobile edge computing system to be composed of M densely distributed micro base stations
Figure RE-FDA0003214556800000011
The micro base stations can communicate with each other through a 5G wireless microwave communication link, and each base station is provided with an edge server for providing computing service for the mobile equipment; provisioning edge servers
Figure RE-FDA0003214556800000012
Has a calculation performance (clock frequency) of f i (ii) a Set the mobile devices under the mobile edge computing system as
Figure RE-FDA0003214556800000013
Each mobile device can wirelessly communicate with the micro base station closest to the mobile device;
setting the mobile equipment to generate G-type calculation intensive and ultra-low delay calculation tasks, wherein the expression form of the task t belonging to G is k t ={d t ,c tt In which d is t Data size (bits), c, defined as a computational task t Defined as the CPU cycles, β, required to execute each bit of task data t (∑ t∈G β t = 1) define probability of generating task t for mobile device; for better reality, the set time is discretized into a plurality of continuous time periods, and each mobile device generates a calculation task in each time period, for example, a task generated by the mobile device u in each time period is defined as
Figure RE-FDA0003214556800000014
3. The method for joint resource allocation and multi-task offloading in a mobile edge computing system based on multi-edge server cooperation of claim 2, wherein the designing of the multi-edge server cooperation model in step 2 is specifically as follows:
in a mobile edge computing system, the invention designs a two-layer cooperative computing unloading framework, wherein the first layer is the unloading from mobile equipment to an edge server, and the second layer is the cooperative unloading between the edge servers;
in the unloading process from the first layer of mobile equipment to the edge server, each mobile equipment is set to completely unload the calculation task to the edge server of the micro base station end which can communicate with the mobile equipment as a whole; setting up
Figure RE-FDA0003214556800000015
Indicating that the mobile device u is associated with the micro base station i, otherwise
Figure RE-FDA0003214556800000016
Figure RE-FDA0003214556800000017
The present invention therefore defines a mobile device association policy as
Figure RE-FDA0003214556800000018
Since each mobile device communicates with at most one micro base station per time period, the association policy satisfies the constraint
Figure RE-FDA0003214556800000019
In the cooperative unloading between the second layer edge servers, the invention sets that each task can be arbitrarily divided into M parts and respectively transmitted to the edge services of the corresponding M micro base station ends through the microwave communication between the base stationsOn the machine; setting up
Figure RE-FDA00032145568000000110
A data size indicating that a task generated by the mobile device u is transmitted to the target server i within a time period, the transmission link being the shortest communication path from the initial base station associated with the mobile device u to the target base station i; the invention sets that each edge server has a task queue, the task which is arrived first can be stored in the task queue, and then the first-come first-serve mechanism is adopted to schedule and execute; the invention sets the multi-task unloading strategy as
Figure RE-FDA00032145568000000111
Which satisfy the constraints
Figure RE-FDA00032145568000000112
To ensure that the tasks generated by the mobile device u are fully executed;
under the multi-edge server cooperation model, the method solves the main problem of realizing the optimal user experience quality within the bearable cost of a network operator; the method sets the user experience quality as the average completion time delay of all tasks in each time period, and sets the bearing cost of a network operator as the energy consumption of all edge servers and the overall state of data carrying among second-layer micro base stations; the data carrying state between the second layer micro base stations is defined as the total size of data transmitted in each time period, which is related to the divided task size and the shortest communication path, and the invention defines the data carrying state as follows:
Figure RE-FDA00032145568000000113
where hoss (u, i) represents the number of Hops in the shortest communication path from the initial base station associated with mobile device u to target base station i;
in order to solve the main problems, the invention also needs to quantify some elements in the mobile edge computing system, including task uploading communication time delay from the mobile equipment to the initial associated micro base station, task transmission time delay between the micro base stations, task waiting time delay on the edge server, task computing time delay on the edge server, computing energy consumption of the edge server and transmission energy consumption between the micro base stations.
4. The method of claim 3, wherein the communication model of step 3 is designed as follows:
in the design of a communication model, the invention defines the task uploading communication time delay from the mobile equipment to the initially associated micro base station and the task transmission time delay between the micro base stations;
the invention uses the orthogonal frequency division multiple access technology as the communication basis between the mobile equipment and the base station, the mobile equipment which is communicated with the same base station is set to be allocated with an orthogonal frequency spectrum, the micro base stations are set to transmit data through microwaves, and signal interference factors of communication between the second layer of micro base stations are ignored, so that the invention only considers the inter-cell interference when the first layer of mobile equipment is communicated with the micro base stations;
in the communication model of the mobile edge computing system, the invention sets the channel set as C = {1,2, \8230;, C }, and the bandwidth of each channel is w; setting a channel allocation policy of
Figure RE-FDA0003214556800000021
Indicating whether channel k is allocated for use by mobile device u; in channel allocation, each mobile device is set to occupy only one channel, so policy B satisfies the constraint
Figure RE-FDA0003214556800000022
Simultaneous micro base station
Figure RE-FDA0003214556800000023
Figure RE-FDA0003214556800000024
The number of channels allocated cannot exceed the upper limit of the number of channels owned by the policy B, and therefore the policy B also satisfies the constraint
Figure RE-FDA0003214556800000025
When the mobile device u occupies the channel k to communicate with the micro base station i, the received intercell signal interference is defined as:
Figure RE-FDA0003214556800000026
in the formula P u′ Represents the transmission power of the mobile device u-,
Figure RE-FDA0003214556800000027
represents the channel gain between the mobile device u-on channel k and the micro base station i; therefore, the communication upload rate of the mobile device u to the micro base station i is as follows:
Figure RE-FDA0003214556800000028
in the formula N 0 Is a high silk white channel noise variance; in the unloading from the first-layer mobile device to the edge server of the micro base station, the uploading communication delay of the mobile device u unloading its own computing task to the initial associated micro base station i is defined as:
Figure RE-FDA0003214556800000029
in the cooperative unloading between the second layer of micro base stations, the microwave transmission rates between the micro base stations are set to be equal, and the waiting time delay between the micro base stations in the data transmission process is ignored, so the communication transmission time delay between the micro base stations is only related to the task scale and the hop count of the micro base stations in the shortest path; the invention sets the data transmission rate between the micro base stations to be alpha, so that the mobile equipmentu partial task
Figure RE-FDA00032145568000000210
The transmission delay from the associated initial base station to the target base station i is:
Figure RE-FDA00032145568000000211
5. the method for joint resource allocation and multi-task offloading in a mobile edge computing system based on multi-edge server cooperation of claim 4, wherein the design computation model of step 4 is specifically as follows:
in the design of a calculation model, the invention defines the calculation time delay and the waiting time delay of a task on an edge server;
part of tasks of an edge server i reaching a target base station end
Figure RE-FDA00032145568000000212
The required computation delay when executed is defined as:
Figure RE-FDA00032145568000000213
the invention sets the state of the task queue of the edge server i at the beginning of the time period l as
Figure RE-FDA00032145568000000214
The method mainly refers to the number of CPU cycles needed when the task waiting for processing on a task queue is calculated; in a moving edge computing system, part of the task
Figure RE-FDA00032145568000000215
Total time delay required to reach target server i
Figure RE-FDA00032145568000000216
The method comprises two parts, namely uploading time delay from the mobile equipment to an initial associated base station
Figure RE-FDA00032145568000000217
With delay of transmission of part of task from initial base station to target base station
Figure RE-FDA00032145568000000218
The detailed definition is as follows:
Figure RE-FDA00032145568000000219
based on total time delay
Figure RE-FDA00032145568000000220
The invention sets a function Sort (u, i, l) to solve a partial task of the mobile device u within a time period l
Figure RE-FDA00032145568000000221
Set of mobile devices arriving earlier at target server i, thus part of the task
Figure RE-FDA00032145568000000222
The latency at the target server i may be defined as:
Figure RE-FDA00032145568000000223
in the formula
Figure RE-FDA00032145568000000224
Indicating a ratio of partial tasks within a time period l
Figure RE-FDA00032145568000000225
The data size of the target server i is reached earlier;
based on the calculation model, the state of the task queue of the edge server i at the beginning of the time period l +1 is:
Figure RE-FDA0003214556800000031
in the formula
Figure RE-FDA0003214556800000032
Represents the size of data received by the edge server i during the time period l, and θ is the length of each time period.
6. The method for joint resource allocation and multi-task offloading in a mobile edge computing system based on multi-edge server cooperation of claim 5, wherein the design energy consumption model in step 5 is as follows:
in the design of the energy consumption model, the invention defines the calculation energy consumption of the edge server and the transmission energy consumption between the micro base stations;
the invention sets the energy consumption of the edge server i for processing one CPU cycle as e i Then, the energy consumption generated when the edge server i finishes processing the task received in the time period l is defined as:
Figure RE-FDA0003214556800000033
the invention defines the communication transmission energy consumption of a second layer micro base station i as follows:
Figure RE-FDA0003214556800000034
in the formula, inPath (i, u, j) =0/1, and a value of 1 indicates that the micro base station u is in a partial task
Figure RE-FDA0003214556800000035
To the targetIn the communication link of the micro base station j, otherwise the value is 0, delta i And transmitting power when the micro base station i performs communication between the second layer base stations.
7. The method for joint resource allocation and multi-task cooperative offloading in a mobile edge computing system based on multi-edge server cooperation of claim 6, wherein the joint optimization problem model and objective function for resource allocation and multi-task cooperative offloading proposed in step 6 are specifically as follows:
the invention comprehensively considers the mobile equipment association strategy, the channel allocation strategy and the multi-task cooperation unloading strategy to form a multi-objective constraint optimization problem, and the problem aims to pursue the optimal user experience quality within the bearable cost of a network operator;
due to the concurrency property of multi-element division, the completion delay of the task depends on the maximum value of the completion delay of a plurality of partial tasks, and the completion delay of the task generated by the mobile equipment u in the time period l is defined as:
Figure RE-FDA0003214556800000036
Figure RE-FDA0003214556800000037
based on the energy consumption model, the energy consumption of the micro base station i in the time period l is defined as:
Figure RE-FDA0003214556800000038
the present invention aims at pursuing the best quality of user experience within the affordable cost of the network operator, so the optimization problem can be defined as:
Figure RE-FDA0003214556800000039
Figure RE-FDA00032145568000000310
Figure RE-FDA00032145568000000311
Figure RE-FDA00032145568000000312
Figure RE-FDA00032145568000000313
Figure RE-FDA00032145568000000314
Figure RE-FDA00032145568000000315
ζ l ≤ζ max C7
the constraint C1 represents the data type of multiple targets, the constraint C2 represents that each mobile device performs associated communication with at most one micro base station in the same time period, the constraint C3 represents that each mobile device uses at most one channel in the same time period, the constraint C4 represents that the number of channels allocated by the micro base station cannot exceed the total number of channels owned by the micro base station, the constraint C5 represents that tasks can be randomly diversified, the constraint C6 represents that the energy consumption of the micro base station cannot exceed the limit value of the micro base station in each time period, the constraint C7 represents that the total transmission state between second-layer micro base stations in each time period cannot exceed the allowed limit value, and the constraint C6 and the constraint C7 represent the affordable cost of a network operator.
8. The method according to claim 7, wherein the method for jointly allocating resources and cooperatively offloading multitasks in a mobile edge computing system based on multi-edge server cooperation is designed by combining a genetic algorithm and a deep reinforcement learning algorithm according to the objective function in step 7, so as to pursue the best quality of service for users within a tolerable cost range of network operators and solve the problem of server overload and the problem of long-term system performance, and the method is as follows:
the invention designs a calculation unloading scheme based on a genetic algorithm and a depth certainty strategy gradient, and the scheme can solve the optimization problem P1 of joint resource allocation and multi-task cooperative unloading;
according to the proposed composition (12) of the two-layer collaborative computation unloading framework and the task completion time delay, the optimization problem P1 is decomposed into two sub-problems by utilizing the decomposition idea in the original-dual method;
in the first tier mobile device to edge server offload, the first sub-problem P2 that is broken down is defined as:
Figure RE-FDA0003214556800000041
S.T, C1,C2,C3,C4
for the subproblem P2, the invention designs a genetic algorithm to acquire a mobile equipment association strategy X and a channel allocation strategy B from the first layer of mobile equipment to the edge server for unloading;
in the cooperative offloading between the second-tier micro base stations, the second sub-problem P3 that is decomposed into is defined as:
Figure RE-FDA0003214556800000042
S.T, C5,C6,C7
for the sub-problem P3, the determination of the multi-task cooperative unloading strategy B in the time period l depends on the state of the edge server in the time period l-1, so the sub-problem P3 is described by a Markov decision process, and the gradient of a deep deterministic strategy in a deep reinforcement learning technology is utilized to solve; since the subproblem P3 has markov characteristics, the long-term performance of the system can be achieved by the cumulative reward function in reinforcement learning;
the invention introduces the genetic algorithm solving process of the subproblem P2, the deep reinforcement learning technology solving process of the subproblem P3 and the calculation unloading scheme based on the genetic algorithm and the depth certainty strategy gradient aiming at the total problem P1 in detail;
step 7.1, designing a mobile equipment association and channel allocation algorithm based on heredity, wherein the heredity algorithm is a random search algorithm for simulating biological evolution to solve complex problems, and the idea of survival of a suitable person is adopted as the evolution principle; genetic algorithms only require that the problem to be solved be computable and do not take into account other mathematical properties, such as differentiable and continuous. Genetic algorithms start with a set of initial solutions and optimize them through some genetic operations (selection, crossover and mutation) until an acceptable solution or convergence is reached; particularly, the crossover and mutation operations of the genetic algorithm can keep the population diversity and expand the search area, so that the search area is not easy to fall into a local optimal point; therefore, genetic algorithms are powerful in searching global regions; the present invention designs genetic operations to solve the sub-problem P2:
step 7.1.1, designing chromosome and fitness function:
in order to define the optimization goal of the subproblem P2, the invention sets the chromosome I of the organism individual as:
I=[X,B] T (14)
in the formula
Figure RE-FDA0003214556800000043
A policy is associated with the mobile device in order to,
Figure RE-FDA0003214556800000044
the channel allocation strategy, in order to satisfy the constraints C1 and C2 in the sub-problem P2,setting variables of the invention
Figure RE-FDA0003214556800000045
A micro base station with which mobile device u is associated to communicate,
Figure RE-FDA0003214556800000046
is a set of micro base stations that can communicate with mobile device u during time period l; to satisfy constraints C1, C2, and C3, variables are set
Figure RE-FDA0003214556800000047
Indicating that mobile device u and micro base station are moving within time period l
Figure RE-FDA0003214556800000048
Associating channels used in communication;
in order to evaluate the quality of biological individuals in the population, the fitness function is set by combining the sub-problem P2 as follows:
Figure RE-FDA0003214556800000049
step 7.1.2, designing population initialization and selection operators:
the present invention sets the biological population initialization operation as follows:
Figure RE-FDA00032145568000000410
Figure RE-FDA0003214556800000051
in the formula, randin (Set) is a generating function and represents that a random element is output from the Set;
for the selection operator, the method selects K individuals from the population to form a parent population by using a championship selection method which is more suitable for minimizing the problem; in order to improve the performance of a genetic algorithm, the optimal individuals in a population are recorded in the genetic process, and if the optimal individuals are not selected in the selection process, the optimal individuals replace the worst individuals in the population;
step 7.1.3, designing crossover and mutation operators:
crossover and mutation operations are effective ways to increase the diversity of offspring, resulting in better problem solutions; for crossover operations, two individuals are randomly selected from the parent population to cross with a probability p c Two offspring individuals are generated through crossing, and the gene exchange is carried out by using a two-point crossing method; for mutation manipulation, the present invention sets the mutation probability of each organism to p m Each chromosome mutates genes in two randomly generated mutation points;
the mutation principle of the mobile equipment association strategy X in the chromosome is set as follows:
Figure RE-FDA0003214556800000052
the mutation principle of the channel allocation strategy B in the chromosome is set as follows:
Figure RE-FDA0003214556800000053
in the formula
Figure RE-FDA0003214556800000054
The maximum element value in policy B is assigned to the channel,
Figure RE-FDA0003214556800000055
for the minimum element value in channel assignment strategy B, round (value) is defined as a function that outputs an integer no greater than the value, ψ 1 ,ψ 2 Two random numbers following a normal distribution U (0, 1);
the present invention defines a genetic-based mobile device association and channel allocation algorithm pseudocode as follows:
Figure RE-FDA0003214556800000056
7.2, designing a multi-task cooperative unloading algorithm based on a depth certainty strategy gradient, dispersing time into a plurality of time periods, and supposing that a batch of task requests in each time period can reach a corresponding edge server according to a mobile equipment association strategy and a channel allocation strategy; because each server is modeled as a queue system, the current queue state of the server can influence the time cost for completing the arrival task, the multi-task cooperative unloading strategy in the time period l depends on the current communication environment and the queue state of the server in the time period l-1, so that the sub-problem P3 can be expressed as a Markov decision process and solved by using a deep deterministic strategy gradient method, and the long-term performance of the system can be considered through the process;
step 7.2.1, markov decision process:
the invention defines the system state of the time period l as
Figure RE-FDA0003214556800000057
The action taken during the time period l is defined as
Figure RE-FDA0003214556800000058
Figure RE-FDA0003214556800000059
Action a l Equivalent to the multi-task cooperative unloading strategy D in the subproblem P3;
the sub-problem P3 aims to minimize the average task completion delay in the second layer micro base station cooperation, so the present invention puts the system in state S l Take action a l The instant prize to be acquired is defined as:
Figure RE-FDA00032145568000000510
Figure RE-FDA00032145568000000511
in the formula
Figure RE-FDA00032145568000000512
Is a penalty term for constraint C6, α i Is a penalty coefficient belonging to edge server i; beta max (0, zeta) l - ζ m4x ) Is a penalty term for constraint C7, β is a penalty factor; the epsilon is an equilibrium value of different attribute units and depends on the maximum difference value of L in the simulation experiment;
after obtaining the instant prize, the system state will be from S l Conversion to S l+1 In order to analyze the influence of the action on the system state, the invention sets the calculated amount of the edge server i in the time period l
Figure RE-FDA0003214556800000061
Comprises the following steps:
Figure RE-FDA0003214556800000062
in the formula
Figure RE-FDA0003214556800000063
Representing the size of data received by the edge server i in a time period l, and theta is the length of each time period; therefore, the system state transition from period l to period l +1 is defined as:
Figure RE-FDA0003214556800000064
to take into account the long-term performance of the system, the invention continues l max A timeUnder-segment multitask collaboration unloading strategy mu S l →a l The long-term cumulative reward of (1) is:
Figure RE-FDA0003214556800000065
wherein gamma belongs to [0,1] as discount coefficient; in the Markov decision process, the action space and the state space both relate to continuous values, so the method adopts a depth certainty strategy gradient method in the deep reinforcement learning to solve;
7.2.2, a multi-task unloading algorithm based on the depth certainty strategy gradient:
the structure of the deep deterministic strategy gradient is realized based on a participant (Actor) -evaluator (Critic) framework, wherein the Actor is responsible for generating actions and interacting with the environment, and the Critic is responsible for evaluating the performance of the Actor and guiding the Actor to generate more optimal actions; the gradient algorithm of the depth deterministic strategy consists of five parts, namely a main Actor network mu (S) respectively l (ii) a θ), primary Critic network Q (S) l ,a l (ii) a Omega), target Actor network mu' (S) l (ii) a Theta '), target critic network Q' (S) l ,a l (ii) a ω') and an empirical playback pool R; the experience replay pool is responsible for storing the system state transition experience, consisting of state transitions and actions per time period and immediate rewards, defined as (S) l ,a l ,r l ,S l+1 ) The experience playback technology randomly samples from the pool to train in the learning process, so that the association between experiences is broken, and the learning performance is improved;
(1) Designing the Main Actor network μ (S) l ;θ):
Deterministic multi-task cooperative unloading strategy mu S under continuous time period l →a l Can be approximated by a parameter theta as a continuous function a l =μ(S l (ii) a θ); iterating and updating network parameters by the Actor network, selecting a current action according to a current state, and interacting with a mobile edge computing environment to generate a next state and an instant reward;
randomly selecting from an empirical playback poolSelecting psi experiences as a sample set psi = { (S) i ,a i ,U i ,S i+1 ) (ii) a i ∈ {0,1, \8230;, ψ } }, sets the primary Actor network update network parameter θ μ The policy gradient formula of (1) is:
Figure RE-FDA0003214556800000066
in order to satisfy the constraint C5, the output values of the main Actor network are normalized, and MN output network values are defined as
Figure RE-FDA0003214556800000067
Normalized post-action a l Value of (1)
Figure RE-FDA0003214556800000068
Is represented as:
Figure RE-FDA0003214556800000069
(2) Design the Primary Critic network Q (S) l ,a l ;ω):
The primary Critic network uses an approximate action value function Q (S) l ,a l (ii) a ω) to evaluate the merits of the selected action and direct the primary Actor network, the action value function based on the bellman equation is expressed as:
Figure RE-FDA00032145568000000610
randomly selecting psi experiences from an experience playback pool as a sample set psi = { (S) i ,a i ,U i ,S i+1 ) (ii) a i e {0,1, \8230;, ψ } }, the primary criticic network updates the network parameter ω by minimizing the loss function L Q Expressed as:
Figure RE-FDA00032145568000000611
in the formula y i =r i (S i ,a i )+γQ′(S i+1 ,μ′(S i+1 ;θ′);ω′),y i The calculation of (2) requires the participation of a target Actor network and a target critic network;
(3) Design target Actor network mu' (S) l ;θ′):
Target Actor network mu' (S) l (ii) a θ') is responsible for the next state S based on the samples from the experience pool i+1 Choose the best next action a i+1 =μ′(S i+1 (ii) a Theta ') of network parameter theta' μ′ Is according to the parameter theta in the main Actor network μ Soft update is performed, represented as:
θ′ μ′ =τθ μ +(1-τ)θ′ μ′ (29)
wherein tau belongs to [0,1] as a soft update coefficient;
(4) Design target criticic network Q' (S) l ,a l ;ω′):
Target criticic network Q' (S) l ,a l (ii) a ω ') is mainly represented in the calculation of the loss function L, the network parameter ω' Q′ Is based on the parameter omega in the primary Critic network Q Soft update is performed, represented as:
ω′ Q′ =τω Q +(1-τ)ω′ Q′ (30)
according to the depth reinforcement learning technology, the invention defines a multi-task cooperative unloading algorithm pseudo code based on the depth certainty strategy gradient as follows:
Figure RE-FDA0003214556800000071
step 7.2.3, designing a calculation unloading scheme based on the genetic and depth certainty strategy gradient,
in a mobile edge computing system with a plurality of mobile devices and a plurality of edge servers, the invention sets that each mobile device generates a task request in each time period, and each micro base station end server can be used as a central controller to solve a computation unloading strategy;
in order to realize the best average user experience within the bearable cost of network operators, the invention designs a calculation unloading scheme based on genetic and deep certainty strategy gradient so as to obtain a mobile equipment association strategy, a channel allocation strategy and a multi-task cooperation unloading strategy; the invention randomly selects a micro base station edge server as a central controller to execute a calculation unloading scheme, which is defined as:
Figure RE-FDA0003214556800000081
CN202110705792.2A 2021-06-24 2021-06-24 Joint resource allocation and multi-task unloading method based on server cooperation Pending CN115529604A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110705792.2A CN115529604A (en) 2021-06-24 2021-06-24 Joint resource allocation and multi-task unloading method based on server cooperation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110705792.2A CN115529604A (en) 2021-06-24 2021-06-24 Joint resource allocation and multi-task unloading method based on server cooperation

Publications (1)

Publication Number Publication Date
CN115529604A true CN115529604A (en) 2022-12-27

Family

ID=84693761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110705792.2A Pending CN115529604A (en) 2021-06-24 2021-06-24 Joint resource allocation and multi-task unloading method based on server cooperation

Country Status (1)

Country Link
CN (1) CN115529604A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858131A (en) * 2023-02-22 2023-03-28 山东海量信息技术研究院 Task execution method, system, device and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858131A (en) * 2023-02-22 2023-03-28 山东海量信息技术研究院 Task execution method, system, device and readable storage medium
CN115858131B (en) * 2023-02-22 2023-05-16 山东海量信息技术研究院 Task execution method, system, device and readable storage medium

Similar Documents

Publication Publication Date Title
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
Fang et al. Joint task offloading, D2D pairing, and resource allocation in device-enhanced MEC: A potential game approach
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN109947545B (en) Task unloading and migration decision method based on user mobility
CN111414252B (en) Task unloading method based on deep reinforcement learning
Lee et al. An online secretary framework for fog network formation with minimal latency
CN111405569A (en) Calculation unloading and resource allocation method and device based on deep reinforcement learning
CN111093203B (en) Service function chain low-cost intelligent deployment method based on environment perception
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
Sun et al. Autonomous resource slicing for virtualized vehicular networks with D2D communications based on deep reinforcement learning
CN111367657B (en) Computing resource collaborative cooperation method based on deep reinforcement learning
CN113641504B (en) Information interaction method for improving edge computing effect of multi-agent reinforcement learning
Balakrishnan et al. Deep reinforcement learning based traffic-and channel-aware OFDMA resource allocation
Esmat et al. Deep reinforcement learning based dynamic edge/fog network slicing
CN107302801A (en) To QoE double-deck matching game method below a kind of 5G mixing scene
CN113821346B (en) Edge computing unloading and resource management method based on deep reinforcement learning
CN115866787A (en) Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation
CN111787543A (en) 5G communication system resource allocation method based on improved wolf optimization algorithm
CN114501667A (en) Multi-channel access modeling and distributed implementation method considering service priority
Hu et al. Dynamic task offloading in MEC-enabled IoT networks: A hybrid DDPG-D3QN approach
Wu et al. A deep reinforcement learning approach for collaborative mobile edge computing
CN114980039A (en) Random task scheduling and resource allocation method in MEC system of D2D cooperative computing
CN115529604A (en) Joint resource allocation and multi-task unloading method based on server cooperation
Zhao Energy efficient resource allocation method for 5G access network based on reinforcement learning algorithm
CN112445617B (en) Load strategy selection method and system based on mobile edge calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination