CN114828018A

CN114828018A - Multi-user mobile edge computing unloading method based on depth certainty strategy gradient

Info

Publication number: CN114828018A
Application number: CN202210325855.6A
Authority: CN
Inventors: 王睿; 史敏燕
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-07-29

Abstract

The invention belongs to the field of wireless communication, and provides a mobile edge server task unloading method based on a depth certainty strategy gradient. Under the condition that a plurality of users exist in the system, the invention achieves the purpose of optimizing the task processing time delay of the system to the minimum by reasonably distributing system resources. The invention adopts a depth certainty strategy gradient algorithm to solve the power distribution problem of processing the tasks at the local end and the mobile edge server end, thereby reducing the task processing time delay in the system. The method models the calculation unloading problem into the problem of minimum maximum calculation time delay of the whole system under the time variation of a channel, simultaneously considers the energy constraints of an edge server and a user, constructs an optimization target conforming to the model of the reinforcement learning algorithm, and solves the calculation unloading strategy by using a depth certainty strategy gradient algorithm.

Description

Multi-user mobile edge computing unloading method based on depth certainty strategy gradient

Technical Field

The invention belongs to the field of wireless communication, and relates to a task unloading method for a mobile edge server, in particular to a task unloading method for the mobile edge server based on a depth certainty strategy gradient.

Background

The mobile edge computing reduces time delay and energy consumption brought by data transmission by deploying the base station in the environment at a position closer to the terminal equipment, so as to meet the requirement of processing a large amount of mobile user data in a short time and ensure the use experience and service quality of users. Computation offload in a mobile edge computing network requires that offload policies be followed. The current main task unloading optimization algorithms include a game algorithm, a convex optimization algorithm, a dynamic programming algorithm and the like. However, the mobile edge computing system has high real-time performance, and parameters such as channel states and task volumes in the system change with the environment, and the traditional optimization method for solving the problems often needs to be iterated for many times and cannot adapt to the high real-time environment of the system. In addition, along with diversification of mobile edge computing unloading scenes, the computing unloading strategy also involves multiple links such as server selection, task allocation and the like, and challenges are brought to solving of the optimal computing unloading strategy. The deep reinforcement learning algorithm is a machine learning algorithm which can carry out self-learning according to feedback given by the environment and dynamically adjust the strategy, and the neural network fitting reward function is used for accelerating the convergence speed, so that the deep reinforcement learning algorithm can more effectively solve the problem of computation unloading in a complex scene compared with the traditional optimization method.

Disclosure of Invention

The invention aims to solve the task and resource allocation problem of a user and an MEC server (mobile edge computing server) under a multi-user computing unloading model.

By adopting a partial unloading mode, a user can simultaneously carry out task calculation locally and can also transmit the task to the mobile edge server for calculation. The connection between the user and the MEC server is variable, considering that there are multiple users in the system. For the problem, the connection scene is subdivided into a one-to-one mode (scene 1) and a one-to-many mode (scene 2), and the two modes are respectively modeled systematically and a resource allocation scheme is proposed.

The method models the calculation unloading problem into the problem of minimizing the maximum calculation delay of the whole system under the condition of channel time variation, simultaneously considers the energy constraints of an edge server and a user, constructs an optimization target conforming to a reinforcement learning algorithm model, and solves an optimal calculation unloading scheme by using a depth certainty strategy gradient.

A multi-user mobile edge computing unloading method based on a depth certainty strategy gradient comprises the following steps:

step 1, initializing a current value network Q, a current strategy network mu, a target value network Q ', a target strategy network mu', wherein the parameters of the current network are theta respectively ^Q 、θ ^μ The parameter of the target network is theta ^Q' ←θ ^Q ，θ ^μ' ←θ ^μ 。

Step 2, initializing an experience pool B;

step 3, at the time t, the current task amount C to be processed of the user is calculated _k (t) set of channel gains h for different users and mobile edge servers at the previous time _k (t-1)＝{h _k1 (t-1),h _k2 (t-1),...,h _kM (t-1) } as the current system state s _k (t)＝[h _k (t-1),C _k (t)]；

And 4, at the moment t, selecting the action a (t) by the user according to the observed system state quantity, selecting the action with the maximum Q value according to the probability of 1-epsilon, and randomly selecting other actions according to the probability of epsilon. Wherein the greedy factor epsilon is a constant of 0-1. The action set includes task allocation and power allocation:

α _k (t) local task weight for user k, α _k C _k Representing the amount of tasks a user processes locally, (1-alpha) _k )C _k Representing the amount of tasks a user offloads to the MEC server.

Representing the local processing power of user k,

represents the data transmission power of user k,

representing the computational power that MEC server m allocates to the tasks offloaded by user k.

Step 5, judging the connection scene of the action according to the action selected by the system, and respectively calculating different connectionsAnd receiving the maximum task processing time delay in the scene. The task processed locally is a _k C _k The local processing delay of user k can be calculated as:

the calculation of the unloading delay comprises two parts: and the transmission delay of the task during unloading and the processing delay of the task at the MEC server are completed only after the task is processed on the MEC server, and the whole task unloading process is finished. According to the amount of unloaded tasks (1-a) _k )C _k Then the transmission delay for user k to offload the task to MEC server m is:

the processing delay of the task unloaded by the user k is calculated by the MEC server m as follows:

local computation and task offloading are performed simultaneously, so that the current task computation delay of user k is the maximum of the local processing delay and the computation offloading delay:

in scenario 1, in one timeslot, one MEC server can be connected only by one user, and the connections between different users and the MEC server are independent and do not affect each other. In scenario 2, different users may connect to the same MEC server at the same time, which means that limited resources of one MEC server are provided for multiple users, and there is a competitive relationship between users. The reward function of the system is:

after the action is executed, calculating the current reward value according to the formula (5), and observing the next state;

step 6, the experience(s) at the time t is compared _t ,a _t ,r _t ,s _t+1 ) Putting the mixture into an experience pool B;

step 7, randomly extracting samples from the experience pool B to train a current value network and a current strategy network;

and 8, calculating a target Q value according to a formula (6):

y _i ＝r _i +γQ′(S _i+1 ,μ′(S _i+1 |θ ^μ′ )|θ ^Q′ ) (6)

and 9, minimizing a loss function of the current value network:

step 10, updating the parameter theta of the current strategy network by adopting a random gradient descent method ^μ ：

Wherein J is a strategy objective function;

is a policy gradient;

is the gradient of the Q function;

as a function of the movement μ(s) _t |θ ^μ )]Of the gradient of (a).

And 11, updating parameters of the target value network and parameters of the target strategy network:

θ ^Q' ←τθ ^Q +(1-τ)θ ^Q' (9)

θ ^μ' ←τθ ^μ +(1-τ)θ ^μ' (10)

wherein, theta ^μ′ Is a policy network parameter; theta ^Q′ A parameter that is a value network; τ is an update weight factor;

and 12, according to the environmental change, the user repeats the steps 3-11 until the strategy is converged, and at the moment, the user learns the optimal calculation unloading scheme.

Further, in step 5, the user local computation delay is derived from the user local computation rate, and the local computation rate of the user k is:

wherein D is _k Is the number of CPU cycles required for user k to complete a task bit (meaning that the size of the task is one bit),

is the calculated power allocated locally by user k,

is the CPU frequency of user k at time t, k _l Is the effective switched capacitance (CPU number/sec) depending on the user equipment chip architecture,

further, in step 5, the task processing delay of the MEC server is derived from the computing rate of the MEC server and the task transmission rate, and the computing rate of the MEC server m is:

wherein D is _m Is the number of CPU cycles, k, required for the MEC server m to complete a task bit _m Is an effective switched capacitor (CPU number/sec) depending on the architecture of the MEC server chip)，

Is the computing power that MEC server m allocates to the task that user k offloaded at time t.

The communication rate between user k and MEC server m is:

wherein, the first and the second end of the pipe are connected with each other,

represents the data transmission power when the user k carries out task unloading at the time t, W represents the system bandwidth, N ₀ Representing the variance of gaussian white noise.

Advantageous effects

The invention achieves the purpose of optimizing the minimum time delay of system task processing by reasonably distributing system resources. The power distribution problem of processing tasks at a local end and a mobile edge server end is solved by adopting a depth deterministic strategy gradient algorithm, so that the task processing time delay in the system is reduced. The mobile edge calculation unloading model provided by the invention considers the mutual influence of a plurality of users, subdivides a connection scene into a one-to-one mode and a one-to-many mode, and respectively carries out system modeling on the two modes and provides a resource allocation scheme. Modeling the calculation unloading problem as the problem of minimum maximum calculation time delay of the whole system under the time variation of a channel, simultaneously considering the energy constraints of an edge server and a user, constructing an optimization target conforming to the model of the reinforcement learning algorithm, and solving the calculation unloading strategy by using a depth certainty strategy gradient algorithm.

Drawings

FIG. 1 is a block diagram of a model framework for a multi-user mobile edge computing offload system

FIG. 2 is a flowchart of a moving edge calculation offload optimization algorithm based on deep reinforcement learning

Detailed Description

The technical solutions provided in the present application will be further described with reference to the following specific embodiments and accompanying drawings. The advantages and features of the present application will become more apparent in conjunction with the following description.

The method models the calculation unloading problem into the problem of minimizing the maximum calculation time delay of the whole system under the time variation of the channel, considers the energy constraints of the edge server and the user, constructs an optimization target in accordance with the model of the reinforcement learning algorithm, and solves the optimal calculation unloading scheme by using the gradient of the depth certainty strategy.

The method comprises the following steps:

1) initializing a current value network Q, a current strategy network mu, a target value network Q ', a target strategy network mu', and setting the parameters of the current network to theta ^Q 、θ ^μ The parameter of the target network is theta ^Q' ←θ ^Q ，θ ^μ' ←θ ^μ . The four networks have the same structure and comprise five layers, namely an input layer, three hidden layers and an output layer, wherein the three hidden layers respectively comprise 200 neurons, 150 neurons and 50 neurons. The learning rates of the value network and the strategy network are 0.0001 and 0.001 respectively;

2) initializing a memory playback unit B of 2 × 10 ⁵ ；

3) The number of users in the system is 2, and the number of MEC servers is 2. At the time t, the current task amount C to be processed of the user is calculated _k (t) set of channel gains h for different users and mobile edge servers at the previous time _k (t-1)＝{h _k1 (t-1),h _k2 (t-1),...,h _kM (t-1) } as the current system state s _k (t)＝[h _k (t-1),C _k (t)]；

4) At time t, the user selects action a (t) according to the observed system state quantity, selects the action with the maximum Q value with the probability of 0.9, and randomly selects other actions with the probability of 0.1. The action set includes task allocation and power allocation:

α _k (t) local task weight for user k, α _k C _k Representing the amount of tasks a user processes locally, (1-alpha) _k )C _k For indicatingThe amount of tasks a customer offloads to the MEC server.

Representing the local processing power of user k,

represents the data transmission power of user k,

5) And judging the connection scene to which the action belongs according to the action selected by the system, and respectively calculating the maximum task processing time delay under different connection scenes. The task processed locally is a _k C _k The local processing delay of user k can be calculated as:

6) the experience(s) at time t _t ,a _t ,r _t ,s _t+1 ) Putting the mixture into an experience pool B;

7) randomly extracting samples from the experience pool B to train a current value network and a current strategy network;

8) the target Q value is calculated according to equation (6):

y _i ＝r _i +γQ'(S _i+1 ,μ'(S _i+1 |θ ^μ' )|θ ^Q' ) (6)

9) minimizing the loss function of the current value network:

10) updating parameter theta of current strategy network by adopting random gradient descent method ^μ ：

11) Updating parameters of the target value network and parameters of the target strategy network:

θ ^Q' ←τθ ^Q +(1-τ)θ ^Q' (9)

θ ^μ' ←τθ ^μ +(1-τ)θ ^μ' (10)

and according to the environment change, the user repeats the steps 3) to 11) until the strategy is converged, and at the moment, the user learns the optimal calculation unloading scheme.

Claims

1. A multi-user mobile edge computing unloading method based on a depth certainty strategy gradient is characterized by comprising the following steps:

step 1, initializing a current value network Q, a current strategy network mu, a target value network Q ', a target strategy network mu', wherein the parameters of the current network are theta respectively ^Q 、θ ^μ The parameter of the target network is theta ^Q' ←θ ^Q ，θ ^μ' ←θ ^μ ；

Step 2, initializing an experience pool B;

Step 4, at the moment t, selecting an action a (t) by a user according to the observed system state quantity, selecting the action with the maximum Q value according to the probability of 1-epsilon, and randomly selecting other actions according to the probability of epsilon, wherein a greedy factor epsilon is a constant of 0-1; the action set includes task allocation and power allocation:

wherein alpha is _k (t) local task weight for user k, α _k C _k Indicating that the user is at homeTask amount of theory, (1-alpha) _k )C _k Representing the amount of tasks a user offloads to the MEC server;

representing the local processing power of user k,

represents the data transmission power of user k,

represents the computational power that the MEC server m allocates to the tasks offloaded by user k;

step 5, judging the connection scene to which the action belongs according to the action selected by the system, and respectively calculating the maximum task processing time delay under different connection scenes; the task processed locally is a _k C _k The local processing delay of user k can be calculated as:

the calculation of the unloading delay comprises two parts: the transmission delay of the task during unloading and the processing delay of the task in the MEC server are completed, and the whole task unloading process is only completed after the task is processed on the MEC server; according to the amount of unloaded tasks (1-a) _k )C _k Then the transmission delay for user k to offload the task to MEC server m is:

in a scenario 1, in a time slot, one MEC server can only be connected by one user, and the connections between different users and the MEC server are independent and do not affect each other;

in scenario 2, different users can connect to the same MEC server at the same time, which means that limited resources of one MEC server are supplied to a plurality of users, and a competitive relationship exists between users; the reward function of the system is:

and 8, calculating a target Q value according to a formula (6):

y _i ＝r _i +γQ′(S _i+1 ,μ′(S _i+1 |θ ^μ′ )|θ ^Q′ )

(6)

and 9, minimizing a loss function of the current value network:

step 10, updating the current by adopting a random gradient descent methodParameter θ of policy network ^μ ：

Wherein J is a strategy objective function;

is a policy gradient;

is the gradient of the Q function;

as a function of the movement μ(s) _t |θ ^μ )]A gradient of (a);

θ ^Q' ←τθ ^Q +(1-τ)θ ^Q' (9)

θ ^μ' ←τθ ^μ +(1-τ)θ ^μ' (10)

wherein, theta ^μ′ Is a policy network parameter; theta ^Q′ A parameter of a value network; τ is an update weight factor;

2. The method for offloading computation of multi-user moving edge based on depth deterministic policy gradient of claim 1, wherein in step 5, the user local computation delay is derived from the user local computation rate, and the local computation rate of user k is:

is the calculated power allocated locally by user k,

is the CPU frequency of user k at time t, k _l Is the effective switched capacitance (CPU number/sec) depending on the user equipment chip architecture.

3. The method according to claim 1, wherein in step 5, the task processing delay of the MEC server is derived from a computing rate of the MEC server and a task transmission rate, and the computing rate of the MEC server m is:

wherein D is _m Is the number of CPU cycles, k, required for the MEC server m to complete a task bit _m Is dependent on the effective switched capacitance (CPU number/sec) of the MEC server chip architecture,

the computing power of the tasks unloaded by the user k is distributed by the MEC server m at the moment t;

the communication rate between user k and MEC server m is: