CN115529604A

CN115529604A - Joint resource allocation and multi-task unloading method based on server cooperation

Info

Publication number: CN115529604A
Application number: CN202110705792.2A
Authority: CN
Inventors: 张红霞; 杨勇进; 王登岳; 肖军弼; 王琪
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2022-12-27

Abstract

The invention discloses a joint resource allocation and multi-task unloading method based on server cooperation. The method comprises the following steps: firstly, establishing a mobile edge computing system model under a multi-mobile device and multi-edge server scene; then designing a multi-edge server cooperation model, a communication model, a calculation model and an energy consumption model; then, a joint optimization problem model of resource allocation and multi-task cooperative unloading and a target function are provided; aiming at an objective function, a combined resource allocation and multi-task cooperative unloading method is designed by combining a genetic algorithm and a deep reinforcement learning algorithm, so that the optimal service quality of a user is pursued within the bearable cost range of a network operator, and the problems of server overload and long-term system performance are solved.

Description

Joint resource allocation and multi-task unloading method based on server cooperation

Technical Field

The invention belongs to the technical field of mobile edge computing, and particularly relates to a joint resource allocation and multi-task unloading method based on server cooperation.

Background

In recent years, computing-intensive and delay-sensitive mobile applications have become widely used, such as online video, real-time gaming, and augmented reality. However, it is difficult to execute these applications locally due to the limited battery life and computing power of smart mobile devices. The mobile edge computing paradigm under densely distributed cellular communication is considered as a promising solution, and the most critical technology of mobile edge computing is computing offloading, which migrates computing tasks to an edge server side for execution, effectively reducing congestion and communication delay of a backbone network compared with cloud computing.

The development of 5G technology has created the pioneer for the rise of mobile edge computing, and small base stations equipped with edge servers are considered as key drivers for mobile edge computing, such as home base stations and enterprise-class small base stations, which have cloud-like computing and storage capabilities. However, small base stations have limited computing and communication resources compared to cloud centers. To implement mobile edge computing in resource-constrained service networks, more research has focused on joint optimization of task offloading and resource allocation.

In order to achieve more reasonable computation offload in resource-constrained mobile edge computation scenarios, the full use of edge servers becomes an urgent problem to be solved. Because the computational tasks that reach edge servers can be highly dynamic and heterogeneous, it is difficult for some overloaded edge servers to consistently provide satisfactory computational services. Therefore, cooperation between multi-edge servers becomes an effective method for solving the problem of inefficient use of servers. For example, an edge server cluster may perform computing tasks by migrating from an overloaded edge server to multiple lightly loaded peer edge servers, providing better service to mobile users.

Since the user's task requests are random in time and space, the immediate performance of the mobile edge computing system cannot be pursued without ignoring long-term performance. However, time-varying communication networks present challenges to ensure long-term performance of the system.

In order to achieve the best user service quality within the range of the commercial cost bearable of the network operator, a joint resource allocation and multi-task unloading method based on server cooperation is necessary.

Disclosure of Invention

The invention aims to provide a joint resource allocation and multi-task unloading method based on server cooperation, which can solve the problem of server overload and realize the optimal user service quality while ensuring the long-term performance of a system.

The technical solution for realizing the purpose of the invention is as follows: a joint resource allocation and multi-task unloading method based on server cooperation comprises the following steps:

step 1, establishing a mobile edge computing system model under the scene of multiple mobile devices and multiple edge servers;

step 2, designing a multi-edge server cooperation model;

step 3, designing a communication model;

step 4, designing a calculation model;

step 5, designing an energy consumption model;

step 6, providing a joint optimization problem model of resource allocation and multi-task cooperative unloading and an objective function;

and 7, aiming at the objective function, designing a combined resource allocation and multi-task cooperative unloading method by combining a genetic algorithm and a deep reinforcement learning algorithm so as to pursue the optimal service quality of the user within the bearable cost range of a network operator and solve the problems of server overload and long-term system performance.

Further, the establishing of the mobile edge computing system model in the multi-mobile device and multi-edge server scene in step 1 is specifically as follows:

setting M densely distributed micro base stations of mobile edge computing system

The micro base stations can communicate with each other through a 5G wireless microwave communication link, and each base station is provided with an edge server for providing computing service for mobile equipment; provisioning edge server

Has a calculation performance (clock frequency) of f _i (ii) a Setting a mobile device set under a mobile edge computing system as

Each mobile device can wirelessly communicate with the micro base station closest to the mobile device;

setting the mobile equipment to generate G-type calculation intensive and ultra-low delay calculation tasks, wherein the expression form of the task t belonging to G is k _t ＝{d _t ,c _t ,β _t In which d is _t Defined as the data size (bits), c, of the computational task _t Defined as the CPU cycles, β, required to execute each bit of task data _t (∑ _t∈G β _t = 1) is defined as the probability that the mobile device generates the task t; for better reality, the set time is discretized into a plurality of continuous time periods, and each mobile device generates a calculation task in each time period, for example, a task generated by the mobile device u in each time period is defined as

Further, the designing of the multi-edge server cooperation model in step 2 is specifically as follows:

in a mobile edge computing system, the invention designs a two-layer cooperative computing unloading framework, wherein the first layer is the unloading from mobile equipment to an edge server, and the second layer is the cooperative unloading between the edge servers;

in the unloading process from the first layer of mobile equipment to the edge server, each mobile equipment is set to completely unload the calculation task to the edge server of the micro base station end which can communicate with the mobile equipment as a whole; setting up

Indicating that the mobile device u is associated with the micro base station i, otherwise

The present invention therefore defines a mobile device association policy as

Since each mobile device communicates with at most one micro base station per time period, the association policy satisfies the constraint

In the cooperative unloading between the second layer of edge servers, the invention sets that each task can be arbitrarily divided into M parts and respectively transmitted to the edge servers of the corresponding M micro base station ends through the microwave communication between the base stations; setting up

A data size representing a data size at which a task generated by the mobile device u is transmitted to the target server i within a time period, the transmission link being a shortest communication path from an initial base station associated with the mobile device u to the target base station i; the invention sets each edge server to have a task queue, the first arrived task is stored in the task queue, then the first coming first service mechanism is adopted to dispatch and execute; the invention sets the multi-task unloading strategy as

Which satisfy the constraints

To ensure that the tasks generated by the mobile device u are fully executed;

under the multi-edge server cooperation model, the method solves the main problem of realizing the optimal user experience quality within the bearable cost of a network operator; the invention sets the user experience quality as the average completion time delay of all tasks in each time period, and sets the bearing cost of a network operator as the energy consumption of all edge servers and the overall state of data carrying among the second layer micro base stations; the data carrying state between the second layer micro base stations is defined as the total size of data transmitted in each time period, which is related to the divided task size and the shortest communication path, and the invention defines the data carrying state as follows:

where Hops (u, i) represents the number of Hops in the shortest communication path from the initial base station associated with mobile device u to the target base station i;

in order to solve the main problems, the present invention also needs to quantify some elements in the mobile edge computing system, including task upload communication delay from the mobile device to the initial associated micro base station, task transmission delay between the micro base stations, task waiting delay on the edge server, task computing delay on the edge server, computing energy consumption of the edge server, and transmission energy consumption between the micro base stations.

Further, the communication model designed in step 3 is specifically as follows:

in the design of a communication model, the invention defines the task uploading communication time delay from the mobile equipment to the initially associated micro base station and the task transmission time delay between the micro base stations;

the invention uses the orthogonal frequency division multiple access technology as the communication basis between the mobile equipment and the base station, the mobile equipment which is communicated with the same base station is set to be allocated with an orthogonal frequency spectrum, the micro base stations are set to transmit data through microwaves, and signal interference factors of communication between the second layer of micro base stations are ignored, so that the invention only considers the inter-cell interference when the first layer of mobile equipment is communicated with the micro base stations;

in a communication model of a mobile edge computing system, the present invention sets up a channel setAs C = {1,2, \8230;, C }, and the bandwidth of each channel is w; setting a channel allocation policy of

Indicating whether channel k is allocated for use by mobile device u; in channel allocation, each mobile device is set to occupy only one channel, so policy B satisfies the constraint

Simultaneous micro base station

The number of allocated channels cannot exceed the upper limit of the number of channels owned by itself, so policy B also satisfies the constraint

When the mobile device u occupies the channel k and is associated with the micro base station i for communication, the received inter-cell signal interference is defined as:

in the formula P _u′ Represents the transmission power of the mobile device u',

represents the channel gain between the mobile device u' on channel k and the micro base station i; therefore, the communication upload rate from the mobile device u to the micro base station i is as follows:

in the formula N ₀ Is a high silk white channel noise variance; in offloading of the first tier mobile device to the micro base station edge server,the uploading communication time delay of the mobile device u for unloading the own calculation task to the initial associated micro base station i is defined as:

in the cooperative unloading between the second layer of micro base stations, the microwave transmission rates between the micro base stations are set to be equal, and the waiting time delay between the micro base stations in the data transmission process is ignored, so that the communication transmission time delay between the micro base stations is only equal to the hop count of the micro base stations in the task scale and the shortest path; the invention sets the data transmission rate between the micro base stations to alpha, so that the mobile device u is partially tasked

The transmission delay from the associated initial base station to the target base station i is:

further, the design calculation model in step 4 is specifically as follows:

in the design of a calculation model, the invention defines the calculation time delay and the waiting time delay of a task on an edge server;

part of tasks of an edge server i reaching a target base station end

The required computation delay when executed is defined as:

the invention sets the state of the task queue of the edge server i at the beginning of the time period l as

Mainly referred to as tasksThe number of CPU cycles needed when the task waiting for processing on the queue is calculated; in a moving edge computing system, part of the task

Total time delay required to reach target server i

The method comprises two parts, namely uploading time delay from the mobile equipment to an initial associated base station

With delay of transmission of part of task from initial base station to target base station

The detailed definition is as follows:

based on total time delay

The invention sets a function Sort (u, i, l) to solve a partial task of a mobile device u in a time period l

Set of mobile devices arriving earlier at target server i, thus part of the task

The latency at the target server i may be defined as:

in the formula

Indicating that a portion of the task is compared over a period of time l

The data size of the target server i is reached earlier;

based on the calculation model, the state of the task queue of the edge server i at the beginning of the time period l +1 is:

in the formula

Indicating the size of the data received by edge server i during time period l,

for the length of each time segment.

Further, the design energy consumption model in step 5 is specifically as follows:

in the design of the energy consumption model, the invention defines the calculation energy consumption of the edge server and the transmission energy consumption between the micro base stations;

the invention sets the energy consumption of the edge server i for processing one CPU cycle as e _i Then, the energy consumption generated when the edge server i finishes processing the task received in the time period l is defined as:

the invention defines the communication transmission energy consumption of a second layer micro base station i as follows:

in the formula, inPath (i, u, j) =0/1, and a value of 1 indicates that the micro base station i is in a partial task

In the communication link to the target micro base station j, otherwise the value is 0, delta _i And transmitting power when the micro base station i performs communication between the second layer base stations.

Further, the joint optimization problem model and objective function for resource allocation and multi-task collaborative offloading provided in step 6 are specifically as follows:

the invention comprehensively considers the mobile equipment association strategy, the channel allocation strategy and the multi-task cooperation unloading strategy to form a multi-objective constraint optimization problem, and the problem aims at pursuing the optimal user experience quality within the bearable cost of a network operator;

due to the concurrency property of multi-element division, the completion delay of the task depends on the maximum value of the completion delay of a plurality of partial tasks, and the completion delay of the task generated by the mobile equipment u in the time period l is defined as:

based on the energy consumption model, the energy consumption of the micro base station i in the time period l is defined as follows:

the present invention aims at pursuing the best quality of user experience within the affordable cost of the network operator, so the optimization problem can be defined as:

ζ ^l ≤ζ ^max C7

the constraint C1 represents the data type of multiple targets, the constraint C2 represents that each mobile device is in associated communication with at most one micro base station in the same time period, the constraint C3 represents that each mobile device uses at most one channel in the same time period, the constraint C4 represents that the number of channels allocated by the micro base stations cannot exceed the total number of channels owned by the micro base stations, the constraint C5 represents that tasks can be randomly diversified, the constraint C6 represents that the energy consumption of the micro base stations cannot exceed the limit value of the micro base stations in each time period, the constraint C7 represents that the total transmission state between the second-layer micro base stations in each time period cannot exceed the allowed limit value, and the constraint C6 and the constraint C7 represent the affordable cost of a network operator.

Further, the method for joint resource allocation and multi-task cooperative offloading designed in step 7 is designed by combining a genetic algorithm and a deep reinforcement learning algorithm with respect to the objective function, so as to pursue the optimal service quality of the user within the bearable cost range of the network operator, and solve the server overload problem and the long-term system performance problem, specifically as follows:

the invention designs a calculation unloading scheme based on a genetic algorithm and a depth certainty strategy gradient, and the scheme can solve the optimization problem P1 of joint resource allocation and multi-task cooperative unloading;

according to the proposed composition (12) of the two-layer collaborative computation unloading framework and the task completion time delay, the optimization problem P1 is decomposed into two sub-problems by utilizing the decomposition idea in the original-dual method;

in the first tier mobile device to edge server offload, the first sub-problem P2 that is broken down is defined as:

for the subproblem P2, the invention designs a genetic algorithm to obtain a mobile equipment association strategy X and a channel allocation strategy B from the first layer of mobile equipment to the edge server for unloading;

in the cooperative offloading between the second-tier micro base stations, the second sub-problem P3 that is decomposed into is defined as:

for the sub-problem P3, the determination of the multi-task cooperative unloading strategy B in the time period l depends on the state of the edge server in the time period l-1, so the sub-problem P3 is described by a Markov decision process, and the gradient of a deep deterministic strategy in a deep reinforcement learning technology is utilized to solve; since the subproblem P3 has markov characteristics, the long-term performance of the system can be achieved by the accumulated reward function in reinforcement learning;

the invention introduces the genetic algorithm solving process of the sub-problem P2, the deep reinforcement learning technology solving process of the sub-problem P3 and the calculation unloading scheme based on the genetic algorithm and the deep certainty strategy gradient aiming at the total problem P1 in detail;

step 7.1, designing a mobile equipment association and channel allocation algorithm based on heredity, wherein the heredity algorithm is a random search algorithm for simulating biological evolution to solve complex problems, and the idea of survival of a suitable person is adopted as the evolution principle; genetic algorithms only require that the problem to be solved be computable and do not take into account other mathematical properties, such as differentiable and continuous. Genetic algorithms start with a set of initial solutions and optimize them through some genetic operations (selection, crossover and mutation) until an acceptable solution or convergence is reached; particularly, the crossover and mutation operations of the genetic algorithm can keep the population diversity and expand the search area, so that the search area is not easy to fall into a local optimal point; therefore, genetic algorithms are powerful in searching global regions; the present invention designs genetic operations to solve the sub-problem P2:

step 7.1.1, designing chromosome and fitness function:

in order to define the optimization goal of the subproblem P2, the invention sets the chromosome I of the organism individual as:

I＝[X,B] ^T (14)

in the formula

A policy is associated with the mobile device in order to,

for the channel allocation strategy, the invention sets variables in order to satisfy constraints C1 and C2 in the subproblem P2

A micro base station with which mobile device u is associated to communicate,

is a set of micro base stations that can communicate with mobile device u during time period l; to satisfy constraints C1, C2, and C3, variables are set

Indicating that mobile device u and micro base station are moving within time period l

Associating channels used in communication;

in order to evaluate the quality of biological individuals in the population, the fitness function is set by combining the sub-problem P2 as follows:

step 7.1.2, designing population initialization and selection operators:

the invention sets the biological population initialization operation as follows:

in the formula, randin (Set) is a generating function and represents that a random element is output from the Set;

for the selection operator, the method selects K individuals from the population to form a parent population by using a championship selection method which is more suitable for minimizing the problem; in order to improve the performance of a genetic algorithm, the optimal individuals in a population are recorded in the genetic process, and if the optimal individuals are not selected in the selection process, the optimal individuals replace the worst individuals in the population;

step 7.1.3, designing crossover and mutation operators:

crossover and mutation operations are effective ways to increase the diversity of offspring, resulting in better problem solutions; for crossover operations, two individuals are randomly selected from the parent population to cross with a probability p _c Two offspring individuals are generated through crossing, and the gene exchange is carried out by using a two-point crossing method; for mutation manipulation, the present invention sets the mutation probability of each organism to p _m Two mutations randomly generated per chromosome pairMutating the gene in the variable point;

the mutation principle of the mobile equipment association strategy X in the chromosome is set as follows:

the mutation principle of the channel allocation strategy B in the chromosome is set as follows:

in the formula

The maximum element value in policy B is assigned to the channel,

for the minimum element value in channel assignment strategy B, round (value) is defined as a function that outputs an integer no greater than the value, ψ ₁ ，ψ ₂ Two random numbers following a normal distribution U (0, 1);

step 7.2, designing a multi-task cooperative unloading algorithm based on a depth certainty strategy gradient, dispersing time into a plurality of time periods, and supposing that a batch of task requests can reach a corresponding edge server in each time period according to a mobile equipment association strategy and a channel allocation strategy; because each server is modeled as a queue system, the current queue state of the server can influence the time cost for completing the arrival task, the multi-task cooperative unloading strategy in the time period l depends on the current communication environment and the queue state of the server in the time period l-1, so that the sub-problem P3 can be expressed as a Markov decision process and solved by using a deep deterministic strategy gradient method, and the long-term performance of the system can be considered through the process;

step 7.2.1, markov decision process:

the invention relates to the time period lThe system state is defined as

The action taken during the time period l is defined as

Action a ^l Equivalent to the multi-task cooperative unloading strategy D in the subproblem P3;

the sub-problem P3 aims to minimize the average task completion delay in the second layer micro base station cooperation, so the present invention puts the system in state S ^l Take action a ^l The instant prize to be acquired is defined as:

in the formula

Is a penalty term for constraint C6, α _i Is a penalty coefficient belonging to edge server i; beta max (0, zeta) ^l -ζ ^max ) Is a penalty term for constraint C7, β is a penalty coefficient; the epsilon is an equilibrium value of different attribute units and depends on the maximum difference value of L in the simulation experiment;

after obtaining the instant prize, the system status will be from S ^l Conversion to S ^l+1 In order to analyze the influence of the action on the system state, the invention sets the calculated amount of the edge server i in the time period l

Comprises the following steps:

in the formula

Indicating the size of the data received by edge server i during time period l,

for the length of each time segment; therefore, the system state transition from period l to period l +1 is defined as:

to take into account the long-term performance of the system, the invention continues l _max The multi-task cooperative unloading strategy mu under each time period: s ^l →a ^l The long-term cumulative reward of (1) is:

wherein gamma belongs to [0,1] as a discount coefficient; in the Markov decision process, the action space and the state space both relate to continuous values, so the method adopts a depth certainty strategy gradient method in the deep reinforcement learning to solve;

7.2.2, a multi-task unloading algorithm based on the depth certainty strategy gradient:

the structure of the depth deterministic strategy gradient is realized based on a participant (Actor) -evaluator (Critic) framework, wherein the Actor is responsible for generating actions and interacting with the environment, and the Critic is responsible for evaluating the performance of the Actor and guiding the Actor to generate better actions; the gradient algorithm of the depth deterministic strategy consists of five parts, namely a main Actor network mu (S) respectively ^l (ii) a θ), primary Critic network Q (S) ^l ，a ^l (ii) a Omega), target Actor network mu' (S) ^l (ii) a Theta '), target critic network Q' (S) ^l ，a ^l (ii) a ω') and experience playback poolsR; the experience replay pool is responsible for storing the system state transition experience, which is defined as (S), and consists of state transitions and actions per time period and instant rewards ^l ，a ^l ，r ^l ，S ^l+1 ) The experience playback technology randomly samples from the pool to train in the learning process, so that the association between experiences is broken, and the learning performance is improved;

(1) Designing the Main Actor network μ (S) ^l ；θ)：

Deterministic multi-task cooperative offloading strategy μ at successive time periods: s. the ^l →a ^l Can be approximated by a parameter theta as a continuous function a ^l ＝μ(S ^l (ii) a θ); the Actor network iteratively updates network parameters, selects a current action according to a current state, and interacts with the mobile edge computing environment to generate a next state and an instant reward;

randomly selecting psi experiences from a pool of empirical playback as a sample set psi = { (S) ⁱ ，a ⁱ ，U ⁱ ，S ⁱ⁺¹ ) (ii) a i belongs to {0,1,. Phi.,. Psi } }, and sets a network updating network parameter theta of a main Actor network _μ The policy gradient formula of (c) is:

in order to satisfy constraint C5, the output value of the main Actor network is normalized by the invention, and MN output network values are defined as

Normalized post-action a ^l Value of

Is represented as:

(2) Design of the Primary Critic network Q (S) ^l ，a ^l ；ω)：

The primary Critic network uses an approximate action value function Q (S) ^l ，a ^l (ii) a ω) to evaluate the merits of the selected action and direct the primary Actor network, the action value function based on the bellman equation is expressed as:

randomly selecting psi experiences from a pool of empirical playback as a sample set psi = { (S) ⁱ ，a ⁱ ，U ⁱ ，S ⁱ⁺¹ ) (ii) a i ∈ {0, 1.,. ψ } }, the primary Critic network updates the network parameters ω by minimizing the loss function L _Q Expressed as:

in the formula y ⁱ ＝r ⁱ (S ⁱ ，a ⁱ )+γQ′(S ⁱ⁺¹ ，μ′(S ⁱ⁺¹ ；θ′)；ω′)，y ⁱ The calculation of (2) requires the participation of a target Actor network and a target critic network;

(3) Design target Actor network mu' (S) ^l ；θ′)：

Target Actor network mu' (S) ^l (ii) a θ') is responsible for the next state S based on the samples from the experience pool ⁱ⁺¹ Choose the best next action a ⁱ⁺¹ ＝μ′(S ⁱ⁺¹ (ii) a Theta ') of network parameter theta' _μ′ Is according to the parameter theta in the main Actor network _μ Soft update is performed, represented as:

θ′ _μ′ ＝τθ _μ +(1-τ)θ′ _μ′ (29)

wherein tau belongs to [0,1] as a soft update coefficient;

(4) Design target criticic network Q' (S) ^l ，a ^l ；ω′)：

Target criticic network Q' (S) ^l ，a ^l (ii) a ω') is mainly embodied in the calculation of the loss function L, the network parametersω′ _q′ Is based on the parameter ω in the primary Critic network _Q Soft update is performed, represented as:

ω′ _Q′ ＝τω _Q +(1-τ)ω′ _Q′ (30)

step 7.1, design of a genetic and depth-deterministic policy gradient-based computational offloading scheme

In a mobile edge computing system with a plurality of mobile devices and a plurality of edge servers, the invention sets that each mobile device generates a task request in each time period, and each micro base station end server can be used as a central controller to solve a computation unloading strategy;

in order to realize the best average user experience within the bearable cost of a network operator, the invention designs a calculation unloading scheme based on a genetic and deep certainty strategy gradient so as to obtain a mobile equipment association strategy, a channel allocation strategy and a multi-task cooperation unloading strategy; the invention randomly selects a micro base station edge server as a central controller to execute a calculation unloading scheme;

compared with the prior art, the invention has the remarkable advantages that: (1) The cooperation of a plurality of edge servers of the micro base station can effectively avoid the problem of server overload; (2) The long-term performance of the system can be effectively ensured by the calculation unloading scheme based on the deep reinforcement learning technology; (3) The method aims at pursuing the optimal experience quality of the user within the bearable cost of a network operator, solves the problem of joint resource optimization in the mobile edge calculation under the scene of multiple mobile devices and multiple edge servers, namely the problems of mobile device association, channel allocation and multi-task cooperative unloading, and provides a technical basis for the effective operation of the mobile edge calculation project under the 5G communication environment.

Drawings

FIG. 1 is a block diagram of a mobile edge computing system in a multi-mobile device and multi-micro base station edge service scenario.

FIG. 2 is a diagram of a two-tier computing offload framework of the present invention.

FIG. 3 is a chromosome structural diagram in the present invention.

FIG. 4 is a schematic diagram of chromosome crossing operation in the present invention.

FIG. 5 is a schematic diagram of the operation of chromosomal mutation in the present invention.

FIG. 6 is a diagram of a deep deterministic policy gradient network architecture in accordance with the present invention.

FIG. 7 is a diagram illustrating normalization of network output values according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The invention relates to a joint resource allocation and multi-element task unloading method based on server cooperation, which comprises the following steps:

step 1, establishing a mobile edge computing system model under the scene of multiple mobile devices and multiple edge servers, which comprises the following steps:

combining the mobile edge computing system model in the multi-mobile device and multi-micro base station edge service scenario in fig. 1, it is configured that the mobile edge computing system has M densely distributed micro base stations

The micro base stations can communicate with each other through a 5G wireless microwave communication link, and each base station is provided with an edge server for providing computing service for the mobile equipment; provisioning edge servers

Has a calculation performance (clock frequency) of f _i (ii) a Set the mobile devices under the mobile edge computing system as

setting the mobile equipment to generate G-type calculation intensive and ultra-low delay calculation tasks, wherein the expression form of the task t belonging to G is k _t ＝{d _t ，c _t ，β _t In which d is _t Data size (bits), c, defined as a computational task _t Is defined as to executeCPU cycle, β, required for each bit of task data _t (∑ _t∈G β _t = 1) is defined as the probability that the mobile device generates the task t; for better reality, the set time is discretized into a plurality of continuous time periods, and each mobile device generates a calculation task in each time period, for example, a task generated by the mobile device u in each time period is defined as

Step 2, designing a multi-edge server cooperation model, which comprises the following specific steps:

with reference to fig. 2, in the mobile edge computing system, the present invention designs a two-layer cooperative computing offload framework, where the first layer is offload from the mobile device to the edge server, and the second layer is cooperative offload between the edge servers;

0, the present invention therefore defines a mobile device association policy as

In the cooperative unloading between the second layer edge servers, the invention sets that each task can be arbitrarily divided into M parts and respectively transmitted to the edge servers of the corresponding M micro base station ends through the microwave communication between the base stationsThe above step (1); setting up

A data size indicating that a task generated by the mobile device u is transmitted to the target server i within a time period, the transmission link being the shortest communication path from the initial base station associated with the mobile device u to the target base station i; the invention sets each edge server to have a task queue, the first arrived task is stored in the task queue, then the first coming first service mechanism is adopted to dispatch and execute; the invention sets the multi-task unloading strategy as

Which satisfy the constraints

To ensure that the tasks generated by the mobile device u are fully executed;

under the multi-edge server cooperation model, the method solves the main problem of realizing the optimal user experience quality within the bearable cost of a network operator; the method sets the user experience quality as the average completion time delay of all tasks in each time period, and sets the bearing cost of a network operator as the energy consumption of all edge servers and the overall state of data carrying among second-layer micro base stations; the data carrying state between the second layer micro base stations is defined as the total size of data transmitted in each time period, which is related to the divided task size and the shortest communication path, and the invention defines the data carrying state as follows:

where hoss (u, i) represents the number of Hops in the shortest communication path from the initial base station associated with mobile device u to target base station i;

Step 3, designing a communication model, specifically as follows:

in a communication model of a mobile edge computing system, the invention sets a channel set as C = {1,2, \8230;, C }, and the bandwidth of each channel is w; setting a channel allocation policy of

Simultaneous micro base station

The number of channels allocated cannot exceed the upper limit of the number of channels owned by the policy B, and therefore the policy B also satisfies the constraint

When the mobile device u occupies the channel k to communicate with the micro base station i, the received intercell signal interference is defined as:

in the formula N ₀ Is a high silk white channel noise variance; in the unloading from the first-layer mobile device to the edge server of the micro base station, the uploading communication delay of the mobile device u unloading its own computing task to the initial associated micro base station i is defined as:

in the cooperative unloading between the second layer of micro base stations, the microwave transmission rates between the micro base stations are set to be equal, and the waiting time delay between the micro base stations in the data transmission process is ignored, so the communication transmission time delay between the micro base stations is only related to the task scale and the hop count of the micro base stations in the shortest path; the invention sets the data transmission rate between the micro base stations to be alpha, so that the partial task of the mobile device u

step 4, designing a calculation model, specifically as follows:

in the design of a calculation model, the invention defines the calculation delay and the waiting delay of a task on an edge server;

part of tasks of an edge server i reaching a target base station end

The required computation delay when executed is defined as:

The method mainly comprises the steps of (1) calculating the number of CPU cycles needed when tasks waiting for processing on a task queue are calculated; in a moving edge computing system, part of the task

Total time delay required to reach target server i

The detailed definition is as follows:

based on total time delay

The invention sets a function Sort (u, i, l) to solve a partial task of the mobile device u within a time period l

The latency at the target server i may be defined as:

in the formula

Indicating that a portion of the task is compared over a period of time l

The data size of the target server i is reached earlier;

in the formula

Indicating the size of the data received by edge server i during time period l,

for the length of each time segment.

Step 5, designing an energy consumption model, which comprises the following specific steps:

In the last communication link to the target micro base station j, otherwise the value is 0, delta _i Transmission power when performing communication between the second layer base stations for the micro base station i.

Step 6, providing a joint optimization problem model of resource allocation and multi-task cooperative unloading and an objective function, wherein the joint optimization problem model comprises the following specific steps:

based on the energy consumption model, the energy consumption of the micro base station i in the time period l is defined as:

ζ _l ≤ζ _max C7

the constraint C1 represents the data type of multiple targets, the constraint C2 represents that each mobile device performs associated communication with at most one micro base station in the same time period, the constraint C3 represents that each mobile device uses at most one channel in the same time period, the constraint C4 represents that the number of channels allocated by the micro base station cannot exceed the total number of channels owned by the micro base station, the constraint C5 represents that tasks can be randomly diversified, the constraint C6 represents that the energy consumption of the micro base station cannot exceed the limit value of the micro base station in each time period, the constraint C7 represents that the total transmission state between second-layer micro base stations in each time period cannot exceed the allowed limit value, and the constraint C6 and the constraint C7 represent the affordable cost of a network operator.

Step 7, aiming at the objective function, a combined resource allocation and multi-task collaborative unloading method is designed by combining a genetic algorithm and a deep reinforcement learning algorithm so as to pursue the optimal service quality of a user within the bearable cost range of a network operator and solve the problems of server overload and long-term system performance, and the method is specifically as follows:

for the sub-problem P3, the determination of the multi-task cooperative unloading strategy B in the time period l depends on the state of the edge server in the time period l-1, so the sub-problem P3 is described by a Markov decision process, and the gradient of a deep deterministic strategy in a deep reinforcement learning technology is utilized to solve; since the subproblem P3 has markov characteristics, the long-term performance of the system can be achieved by the cumulative reward function in reinforcement learning;

the invention introduces the genetic algorithm solving process of the subproblem P2, the deep reinforcement learning technology solving process of the subproblem P3 and the calculation unloading scheme based on the genetic algorithm and the depth certainty strategy gradient aiming at the total problem P1 in detail;

step 7.1, designing a mobile equipment association and channel allocation algorithm based on heredity, wherein the heredity algorithm is a random search algorithm for simulating biological evolution to solve complex problems, and the idea of survival of a suitable person is adopted as the evolution principle; genetic algorithms only require that the problem to be solved be computable and do not take into account other mathematical properties, such as differentiable and continuous. Genetic algorithms start with a set of initial solutions and optimize them through some genetic operations (selection, crossover and mutation) until an acceptable solution or convergence is reached; particularly, the crossover and mutation operations of the genetic algorithm can keep the population diversity and expand the search area, so that the search area is not easy to fall into a local optimal point; therefore, genetic algorithms are powerful in searching for global regions; the present invention designs genetic operations to solve the sub-problem P2:

step 7.1.1, designing chromosome and fitness function:

referring to fig. 3, in order to clarify the optimization goal of the subproblem P2, the present invention sets the chromosome I of the organism individual as:

I＝[X,B] ^T (14)

in the formula

A policy is associated with the mobile device in order to,

channel allocation strategyTo satisfy constraints C1 and C2 in sub-problem P2, the present invention sets variables

A micro base station with which mobile device u is associated to communicate,

Associating channels used in communication;

in order to evaluate the quality of biological individuals in the population, the fitness function is set by combining a subproblem P2:

step 7.1.2, designing population initialization and selecting operators:

for the selection operator, the method selects K individuals from the population to form a parent population by using a selection method which is more suitable for minimizing problem tournaments; in order to improve the performance of a genetic algorithm, the optimal individuals in a population are recorded in the genetic process, and if the optimal individuals are not selected in the selection process, the optimal individuals replace the worst individuals in the population;

step 7.1.3, designing crossover and mutation operators:

crossover and mutation operations are effective ways to increase the diversity of offspring, resulting in better problem solutions; for crossover operations, in conjunction with the chromosome crossover operation in FIG. 4, two individuals were randomly selected from the parent population to have a crossover probability p _c Two offspring individuals are generated through crossing, and the gene exchange is carried out by using a two-point crossing method; for mutation operation, in combination with the chromosomal mutation operation in FIG. 5, the present invention sets the mutation probability of each organism individual to p _m Each chromosome mutates genes in two randomly generated mutation points;

in the formula

The maximum element value in policy B is assigned to the channel,

for the minimum element value in channel assignment policy B, round (value) is defined as a function that outputs an integer no greater than the value, ψ ₁ ，ψ ₂ Two random numbers following a normal distribution U (0, 1);

the present invention defines a genetic-based mobile device association and channel allocation algorithm pseudocode as follows:

step 7.2.1, markov decision process:

the invention defines the system state of the time period l as

The action taken during the time period l is defined as

the sub-problem P3 aims to minimize the average task completion delay in the second layer micro base station cooperation, so the present invention puts the system in state S ^l Take action a ^l Timely acquired instant prize definitionComprises the following steps:

in the formula

Is a penalty term for constraint C6, α _i Is a penalty factor belonging to the edge server i; beta max (0, zeta) ^l -ζ ^max ) Is a penalty term for constraint C7, β is a penalty factor; e is the equilibrium value of different attribute units, which depends on the maximum difference value of L in the simulation experiment;

after obtaining the instant prize, the system state will be from S ^l Conversion to S ^l+1 In order to analyze the influence of the action on the system state, the invention sets the calculated amount of the edge server i in a time period l

Comprises the following steps:

in the formula

Indicating the size of the data received by edge server i during time period l,

to take into account the long-term performance of the system, the invention continues l _max Multi-task cooperative unloading strategy mu S in each time period ^l →a ^l The long-term cumulative reward of (1) is:

wherein gamma belongs to [0,1] as discount coefficient; in the Markov decision process, the action space and the state space both relate to continuous values, so the method adopts a depth certainty strategy gradient method in the deep reinforcement learning to solve;

the structure of the deep deterministic strategy gradient is realized based on a participant (Actor) -evaluator (Critic) framework, wherein the Actor is responsible for generating actions and combining environment interaction, and the Critic is responsible for evaluating the performance of the Actor and guiding the Actor to generate more optimal actions; with reference to fig. 6, the depth-deterministic policy gradient algorithm consists of five parts, respectively the main Actor network μ (S) ^l (ii) a θ), primary Critic network Q (S) ^l ,a ^l (ii) a Omega), target Actor network mu' (S) ^l (ii) a Theta '), a target critical network Q' (S) ^l ,a ^l (ii) a ω') and an empirical playback pool R; the experience replay pool is responsible for storing the system state transition experience, which is defined as (S), and consists of state transitions and actions per time period and instant rewards ^l ,a ^l ,r ^l ,S ^l+1 ) The experience playback technology randomly samples from the pool to train in the learning process, so that the association between experiences is broken, and the learning performance is improved;

(1) Designing the Main Actor network μ (S) ^l ；θ)：

Deterministic multi-task cooperative offloading strategy μ at successive time periods: s. the ^l →a ^l Can be approximated by a parameter theta as a continuous function a ^l ＝μ(S ^l (ii) a θ); the Actor network iteratively updates network parameters, selects a current action according to a current state, and interacts with the mobile edge computing environmentGenerating a next state and an instant prize;

randomly selecting psi experiences from a pool of empirical playback as a sample set psi = { (S) ⁱ ，a ⁱ ，U ⁱ ，S ⁱ⁺¹ ) (ii) a i belongs to {0, 1.,. Psi } }, and sets a main Actor network updating network parameter theta _μ The policy gradient formula of (1) is:

in order to satisfy constraint C5, the present invention normalizes the output values of the primary Actor network, and in conjunction with fig. 7, mn output network values are defined as

Normalized post-action a ^l Value of

Is represented as:

(2) Design of the Primary Critic network Q (S) ^l ，a ^l ；ω)：

randomly selecting psi experiences from an experience playback pool as a sample set psi = { (S) ⁱ ，a ⁱ ，U ⁱ ，S ⁱ⁺¹ ) (ii) a i ∈ {0, 1.,. ψ } }, the primary Critic network updates the network parameters ω by minimizing the loss function L _Q Expressed as:

(3) Design target Actor network mu' (S) ^l ；θ′)：

Target Actor network mu' (S) ^l (ii) a θ') is responsible for the next state S based on the samples from the experience pool ⁱ⁺¹ Choose the best next action a ⁱ⁺¹ ＝μ′(S ⁱ⁺¹ (ii) a θ '), network parameter θ' _μ′ Is according to the parameter theta in the main Actor network _μ Soft update is performed, represented as:

θ′ _μ′ ＝τθ _μ +(1-τ)θ′ _μ′ (29)

wherein tau belongs to [0,1] as soft update coefficient;

(4) Design target criticic network Q' (S) ^l ，a ^l ；ω′)：

Target criticic network Q' (S) ^l ，a ^l (ii) a ω ') is mainly represented in the calculation of the loss function L, the network parameter ω' _Q′ Is based on the parameter omega in the primary Critic network _Q Soft update is performed, represented as:

ω′ _Q′ ＝τω _Q +(1-τ)ω′ _Q′ (30)

according to the deep reinforcement learning technology, the invention defines the pseudo code of the multi-task cooperative unloading algorithm based on the deep certainty strategy gradient as follows:

step 7.2.3, designing a calculation unloading scheme based on the inheritance and depth certainty strategy gradient,

in order to realize the best average user experience within the bearable cost of a network operator, the invention designs a calculation unloading scheme based on a genetic and deep certainty strategy gradient so as to obtain a mobile equipment association strategy, a channel allocation strategy and a multi-task cooperation unloading strategy; the invention randomly selects a micro base station edge server as a central controller to execute a calculation unloading scheme, wherein the scheme is defined as:

Claims

1. a joint resource allocation and multi-task unloading method based on server cooperation is characterized by comprising the following steps:

step 2, designing a multi-edge server cooperation model;

step 3, designing a communication model;

step 4, designing a calculation model;

step 5, designing an energy consumption model;

2. The method for joint resource allocation and multi-task offloading in a mobile edge computing system based on multi-edge server cooperation as claimed in claim 1, wherein the model of the mobile edge computing system in the multi-mobile device and multi-edge server scenario in step 1 is as follows:

setting a mobile edge computing system to be composed of M densely distributed micro base stations

setting the mobile equipment to generate G-type calculation intensive and ultra-low delay calculation tasks, wherein the expression form of the task t belonging to G is k _t ＝{d _t ,c _t ,β _t In which d is _t Data size (bits), c, defined as a computational task _t Defined as the CPU cycles, β, required to execute each bit of task data _t (∑ _t∈G β _t = 1) define probability of generating task t for mobile device; for better reality, the set time is discretized into a plurality of continuous time periods, and each mobile device generates a calculation task in each time period, for example, a task generated by the mobile device u in each time period is defined as

3. The method for joint resource allocation and multi-task offloading in a mobile edge computing system based on multi-edge server cooperation of claim 2, wherein the designing of the multi-edge server cooperation model in step 2 is specifically as follows:

The present invention therefore defines a mobile device association policy as

In the cooperative unloading between the second layer edge servers, the invention sets that each task can be arbitrarily divided into M parts and respectively transmitted to the edge services of the corresponding M micro base station ends through the microwave communication between the base stationsOn the machine; setting up

A data size indicating that a task generated by the mobile device u is transmitted to the target server i within a time period, the transmission link being the shortest communication path from the initial base station associated with the mobile device u to the target base station i; the invention sets that each edge server has a task queue, the task which is arrived first can be stored in the task queue, and then the first-come first-serve mechanism is adopted to schedule and execute; the invention sets the multi-task unloading strategy as

Which satisfy the constraints

To ensure that the tasks generated by the mobile device u are fully executed;

in order to solve the main problems, the invention also needs to quantify some elements in the mobile edge computing system, including task uploading communication time delay from the mobile equipment to the initial associated micro base station, task transmission time delay between the micro base stations, task waiting time delay on the edge server, task computing time delay on the edge server, computing energy consumption of the edge server and transmission energy consumption between the micro base stations.

4. The method of claim 3, wherein the communication model of step 3 is designed as follows:

in the communication model of the mobile edge computing system, the invention sets the channel set as C = {1,2, \8230;, C }, and the bandwidth of each channel is w; setting a channel allocation policy of

Simultaneous micro base station

in the formula P _u′ Represents the transmission power of the mobile device u-,

represents the channel gain between the mobile device u-on channel k and the micro base station i; therefore, the communication upload rate of the mobile device u to the micro base station i is as follows:

in the cooperative unloading between the second layer of micro base stations, the microwave transmission rates between the micro base stations are set to be equal, and the waiting time delay between the micro base stations in the data transmission process is ignored, so the communication transmission time delay between the micro base stations is only related to the task scale and the hop count of the micro base stations in the shortest path; the invention sets the data transmission rate between the micro base stations to be alpha, so that the mobile equipmentu partial task

5. the method for joint resource allocation and multi-task offloading in a mobile edge computing system based on multi-edge server cooperation of claim 4, wherein the design computation model of step 4 is specifically as follows:

part of tasks of an edge server i reaching a target base station end

The required computation delay when executed is defined as:

The method mainly refers to the number of CPU cycles needed when the task waiting for processing on a task queue is calculated; in a moving edge computing system, part of the task

Total time delay required to reach target server i

The detailed definition is as follows:

based on total time delay

The latency at the target server i may be defined as:

in the formula

Indicating a ratio of partial tasks within a time period l

The data size of the target server i is reached earlier;

in the formula

Represents the size of data received by the edge server i during the time period l, and θ is the length of each time period.

6. The method for joint resource allocation and multi-task offloading in a mobile edge computing system based on multi-edge server cooperation of claim 5, wherein the design energy consumption model in step 5 is as follows:

in the formula, inPath (i, u, j) =0/1, and a value of 1 indicates that the micro base station u is in a partial task

To the targetIn the communication link of the micro base station j, otherwise the value is 0, delta _i And transmitting power when the micro base station i performs communication between the second layer base stations.

7. The method for joint resource allocation and multi-task cooperative offloading in a mobile edge computing system based on multi-edge server cooperation of claim 6, wherein the joint optimization problem model and objective function for resource allocation and multi-task cooperative offloading proposed in step 6 are specifically as follows:

the invention comprehensively considers the mobile equipment association strategy, the channel allocation strategy and the multi-task cooperation unloading strategy to form a multi-objective constraint optimization problem, and the problem aims to pursue the optimal user experience quality within the bearable cost of a network operator;

ζ ^l ≤ζ ^max C7

8. The method according to claim 7, wherein the method for jointly allocating resources and cooperatively offloading multitasks in a mobile edge computing system based on multi-edge server cooperation is designed by combining a genetic algorithm and a deep reinforcement learning algorithm according to the objective function in step 7, so as to pursue the best quality of service for users within a tolerable cost range of network operators and solve the problem of server overload and the problem of long-term system performance, and the method is as follows:

S.T, C1,C2,C3,C4

for the subproblem P2, the invention designs a genetic algorithm to acquire a mobile equipment association strategy X and a channel allocation strategy B from the first layer of mobile equipment to the edge server for unloading;

S.T, C5,C6,C7

step 7.1.1, designing chromosome and fitness function:

I＝[X,B] ^T (14)

in the formula

A policy is associated with the mobile device in order to,

the channel allocation strategy, in order to satisfy the constraints C1 and C2 in the sub-problem P2,setting variables of the invention

A micro base station with which mobile device u is associated to communicate,

Associating channels used in communication;

step 7.1.2, designing population initialization and selection operators:

the present invention sets the biological population initialization operation as follows:

step 7.1.3, designing crossover and mutation operators:

crossover and mutation operations are effective ways to increase the diversity of offspring, resulting in better problem solutions; for crossover operations, two individuals are randomly selected from the parent population to cross with a probability p _c Two offspring individuals are generated through crossing, and the gene exchange is carried out by using a two-point crossing method; for mutation manipulation, the present invention sets the mutation probability of each organism to p _m Each chromosome mutates genes in two randomly generated mutation points;

in the formula

The maximum element value in policy B is assigned to the channel,

7.2, designing a multi-task cooperative unloading algorithm based on a depth certainty strategy gradient, dispersing time into a plurality of time periods, and supposing that a batch of task requests in each time period can reach a corresponding edge server according to a mobile equipment association strategy and a channel allocation strategy; because each server is modeled as a queue system, the current queue state of the server can influence the time cost for completing the arrival task, the multi-task cooperative unloading strategy in the time period l depends on the current communication environment and the queue state of the server in the time period l-1, so that the sub-problem P3 can be expressed as a Markov decision process and solved by using a deep deterministic strategy gradient method, and the long-term performance of the system can be considered through the process;

step 7.2.1, markov decision process:

the invention defines the system state of the time period l as

The action taken during the time period l is defined as

in the formula

Is a penalty term for constraint C6, α _i Is a penalty coefficient belonging to edge server i; beta max (0, zeta) ^l - ζ ^m4x ) Is a penalty term for constraint C7, β is a penalty factor; the epsilon is an equilibrium value of different attribute units and depends on the maximum difference value of L in the simulation experiment;

after obtaining the instant prize, the system state will be from S ^l Conversion to S ^l+1 In order to analyze the influence of the action on the system state, the invention sets the calculated amount of the edge server i in the time period l

Comprises the following steps:

in the formula

Representing the size of data received by the edge server i in a time period l, and theta is the length of each time period; therefore, the system state transition from period l to period l +1 is defined as:

to take into account the long-term performance of the system, the invention continues l _max A timeUnder-segment multitask collaboration unloading strategy mu S ^l →a ^l The long-term cumulative reward of (1) is:

the structure of the deep deterministic strategy gradient is realized based on a participant (Actor) -evaluator (Critic) framework, wherein the Actor is responsible for generating actions and interacting with the environment, and the Critic is responsible for evaluating the performance of the Actor and guiding the Actor to generate more optimal actions; the gradient algorithm of the depth deterministic strategy consists of five parts, namely a main Actor network mu (S) respectively ^l (ii) a θ), primary Critic network Q (S) ^l ,a ^l (ii) a Omega), target Actor network mu' (S) ^l (ii) a Theta '), target critic network Q' (S) ^l ,a ^l (ii) a ω') and an empirical playback pool R; the experience replay pool is responsible for storing the system state transition experience, consisting of state transitions and actions per time period and immediate rewards, defined as (S) ^l ,a ^l ,r ^l ,S ^l+1 ) The experience playback technology randomly samples from the pool to train in the learning process, so that the association between experiences is broken, and the learning performance is improved;

(1) Designing the Main Actor network μ (S) ^l ；θ)：

Deterministic multi-task cooperative unloading strategy mu S under continuous time period ^l →a ^l Can be approximated by a parameter theta as a continuous function a ^l ＝μ(S ^l (ii) a θ); iterating and updating network parameters by the Actor network, selecting a current action according to a current state, and interacting with a mobile edge computing environment to generate a next state and an instant reward;

randomly selecting from an empirical playback poolSelecting psi experiences as a sample set psi = { (S) ⁱ ,a ⁱ ,U ⁱ ,S ⁱ⁺¹ ) (ii) a i ∈ {0,1, \8230;, ψ } }, sets the primary Actor network update network parameter θ _μ The policy gradient formula of (1) is:

in order to satisfy the constraint C5, the output values of the main Actor network are normalized, and MN output network values are defined as

Normalized post-action a ^l Value of (1)

Is represented as:

(2) Design the Primary Critic network Q (S) ^l ,a ^l ；ω)：

The primary Critic network uses an approximate action value function Q (S) ^l ,a ^l (ii) a ω) to evaluate the merits of the selected action and direct the primary Actor network, the action value function based on the bellman equation is expressed as:

randomly selecting psi experiences from an experience playback pool as a sample set psi = { (S) ⁱ ,a ⁱ ,U ⁱ ,S ⁱ⁺¹ ) (ii) a i e {0,1, \8230;, ψ } }, the primary criticic network updates the network parameter ω by minimizing the loss function L _Q Expressed as:

in the formula y ⁱ ＝r ⁱ (S ⁱ ,a ⁱ )+γQ′(S ⁱ⁺¹ ,μ′(S ⁱ⁺¹ ；θ′)；ω′)，y ⁱ The calculation of (2) requires the participation of a target Actor network and a target critic network;

(3) Design target Actor network mu' (S) ^l ；θ′)：

θ′ _μ′ ＝τθ _μ +(1-τ)θ′ _μ′ (29)

wherein tau belongs to [0,1] as a soft update coefficient;

(4) Design target criticic network Q' (S) ^l ,a ^l ；ω′)：

Target criticic network Q' (S) ^l ,a ^l (ii) a ω ') is mainly represented in the calculation of the loss function L, the network parameter ω' _Q′ Is based on the parameter omega in the primary Critic network _Q Soft update is performed, represented as:

ω′ _Q′ ＝τω _Q +(1-τ)ω′ _Q′ (30)

according to the depth reinforcement learning technology, the invention defines a multi-task cooperative unloading algorithm pseudo code based on the depth certainty strategy gradient as follows:

step 7.2.3, designing a calculation unloading scheme based on the genetic and depth certainty strategy gradient,

in order to realize the best average user experience within the bearable cost of network operators, the invention designs a calculation unloading scheme based on genetic and deep certainty strategy gradient so as to obtain a mobile equipment association strategy, a channel allocation strategy and a multi-task cooperation unloading strategy; the invention randomly selects a micro base station edge server as a central controller to execute a calculation unloading scheme, which is defined as: