CN114706631A

CN114706631A - Unloading decision method and system in mobile edge calculation based on deep Q learning

Info

Publication number: CN114706631A
Application number: CN202210427768.1A
Authority: CN
Inventors: 杨柱天; 朱伟强; 杨蔚; 佟令宇; 杨佳敏; 陈迪
Original assignee: Harbin Institute of Technology; 8511 Research Institute of CASIC
Current assignee: Harbin Institute of Technology; 8511 Research Institute of CASIC
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2022-07-05
Anticipated expiration: 2042-04-22
Also published as: CN114706631B

Abstract

An unloading decision method and system in mobile edge calculation based on deep Q learning belongs to the technical field of unloading decision of mobile equipment in a mobile edge calculation system. The invention solves the problems of large time delay and high energy consumption generated in the unloading decision process in the existing mobile edge computing system. The invention applies a deep reinforcement learning algorithm to the unloading decision problem in the mobile edge calculation, and designs the corresponding system state, action and reward equation according to the task scheduling models such as a local calculation queue, a task transmission queue, an edge server queue and the like established in the system. By comparing the average time delay and energy consumption of the method with those of other algorithms, the unloading decision method disclosed by the invention can be used for greatly reducing the time delay and energy consumption generated in the unloading decision process in the mobile edge computing system. The method can be applied to the unloading decision of the mobile equipment in the mobile edge computing system.

Description

Unloading decision method and system in mobile edge calculation based on deep Q learning

Technical Field

The invention belongs to the technical field of unloading decision of mobile equipment in a mobile edge computing system, and particularly relates to an unloading decision method and system in mobile edge computing based on deep Q learning.

Background

With the rapid development of 5G and Internet of things technologies, people have stepped into a new world of everything interconnection. In recent years, the number of mobile devices with networking functions, such as smart phones, smart home appliances, smart wearable devices, and the like, has increased in a blowout manner, and meanwhile, the emergence of new functions, such as virtual reality, real-time path planning, online video processing, and the like, has also made stricter requirements on the capabilities of data transmission and data calculation. How to find an effective way to solve the needs of the internet of things equipment for data transmission and data calculation is an urgent problem to be solved, and the mobile edge calculation becomes an effective solution.

Although the existing mobile edge computing method has achieved a certain achievement, the time delay generated in the unloading decision process in the existing mobile edge computing system is still large, and the energy consumption generated in the unloading decision process is still high, so that it is necessary to provide an unloading decision method for the mobile edge computing system to reduce the time delay and the energy consumption generated in the unloading decision process.

Disclosure of Invention

The invention aims to solve the problems of large time delay and high energy consumption in the unloading decision process in the conventional mobile edge computing system, and provides an unloading decision method and an unloading decision system in mobile edge computing based on deep Q learning.

The technical scheme adopted by the invention for solving the technical problems is as follows:

based on one aspect of the invention, the method for unloading decision in moving edge calculation based on deep Q learning specifically comprises the following steps:

step one, building a reinforcement learning model

Constructing a system state, a system action and a reward function in the Markov decision process according to the task characteristics;

step two, constructing a neural network

Constructing a neural network comprising an input layer, an LSTM layer, a first FC layer, a second FC layer and an output layer, wherein the input layer is used for transmitting system state information to the LSTM layer and the first FC layer and taking the output of the LSTM layer as the input of the first FC layer;

the output of the first FC layer is then used as the input to the second FC layer, and the output of the second FC layer is used as the input to the output layer.

Further, the system state is constructed in the following manner:

denote the self task size of the mobile device m at the beginning of the current slot as λ_m(t), if the new task k (t) exists in the mobile device m when the current time slot starts, lambda_m(t) k (t), otherwise λ_m(t)＝0；

Constructing a local calculation queue, a task transmission queue and an edge node calculation queue, and representing the number of time slots of the self task of the mobile equipment m needing to wait in the local calculation queue when the current time slot starts as the number of the time slots

The number of time slots that the self task of the mobile device m needs to wait in the task transmission queue when the current time slot starts is expressed as

Denote the queue length of mobile m at edge node n as

Constructing a matrix M (T) for representing the load level of each edge server in T time slots before the current time slot, wherein the dimensionality of the M (T) is T multiplied by N, and N is the number of the edge servers;

the system state s observed by mobile m at the current time slot_m(t) is:

further, the system action is denoted as a (t) {0,1, 2, …, N }, where 0 denotes local computation and k ═ 1,2, …, N, k denotes the sequence number of the offloaded edge server.

Further, the construction mode of the reward function is as follows:

if the task is decided to be calculated locally, the number of time slots for which the task waits

Comprises the following steps:

wherein, the first and the second end of the pipe are connected with each other,

indicating the time after the task generated at the time slot t' is executed locally;

energy required in task local computation

Comprises the following steps:

wherein epsilon_mRepresenting the CPU's coefficient of energy consumption during the local calculation of the mobile device m, i.e. the energy consumed by the local CPU for a cycle, d_mRepresenting the calculation amount of the currently generated task of the mobile device m, namely the number of CPU calculation cycles needed for executing the currently generated task;

setting preference coefficients of the mobile user m to time delay and energy consumption as

And

then the reward function for mobile user m in the offload decision process is:

wherein, R is the value of the reward function, T is the total time delay generated when the mobile user m is locally calculated, namely T is equal to the number of time slots of task queuing waiting

E is the total energy consumption generated by the mobile user m, i.e. the sum of the time delays generated during the local execution of the task

Further, the construction mode of the reward function is as follows:

if the task is decided as edge calculation, the number of time slots for waiting the task passes through the time after the edge server n finishes executing

Is calculated as the number of time slots the task waits

The energy required in task edge computing comprises two parts of task uploading and task execution, and the power of the mobile equipment when the task is uploaded is represented as p_upThe power of the mobile device when a task is performed is denoted as p_eThen for mobile device m, the required energy

Comprises the following steps:

wherein, t_n,upRepresenting the time it takes for mobile device m to upload a task to edge server n, t_n,eRepresenting the time it takes for the mobile device m to perform a task in the edge server n.

At this time, the reward function of the user in the unloading decision process is as follows:

wherein R is the value of the reward function, and T is the total time delay generated by task queuing

Time delay t generated by uploading task to edge server n_n,upAnd the time delay t generated by the execution of the task at the edge server n_n,eE is the total energy consumption resulting from the edge calculation, i.e.

Further, the construction mode of the reward function is as follows:

if the maximum delay time allowed by the task is reached before the task is executed, the task is discarded, and the value R of the reward function at the moment is set as a fixed penalty value P.

Further, the LSTM layer is used to predict the temporal dependence of the edge server load level according to a matrix m (t).

Further, the first FC layer and the second FC layer are used for learning the mapping of the system state to the system action reward function value, and each of the first FC layer and the second FC layer comprises a group of neurons with rectifying linear units.

Further, the output layer is configured to output a value of the reward function corresponding to the currently selected action in the current system state.

According to another aspect of the invention, the unloading decision system in the mobile edge calculation based on the deep Q learning is used for executing the unloading decision method in the mobile edge calculation based on the deep Q learning.

The invention has the beneficial effects that:

the invention applies a deep reinforcement learning algorithm to the unloading decision problem in the mobile edge calculation, and designs the corresponding system state, action and reward equation according to the task scheduling models such as a local calculation queue, a task transmission queue, an edge server queue and the like established in the system. By comparing the average time delay and energy consumption of the method with those of other algorithms, the unloading decision method disclosed by the invention can be used for greatly reducing the time delay and energy consumption generated in the unloading decision process in the mobile edge computing system.

Drawings

FIG. 1 is a diagram of a neural network constructed in accordance with the present invention;

FIG. 2 is a graph of the convergence of the value of the reward function with the number of iterations of the method of the present invention;

FIG. 3 is a graph of average reward value versus number of users for the method of the present invention and three other baseline algorithms;

FIG. 4 is a graph of the average delay as a function of the number of users for the method of the present invention and three other baseline algorithms.

Detailed Description

In a first specific embodiment, the present embodiment provides a computation offloading strategy based on deep reinforcement learning for a network scenario with multiple mobile devices and multiple servers in an MEC system. Each mobile user is regarded as an intelligent agent, and tasks need to be queued due to the arrival sequence of the tasks in the unloading process. Then, time delay and energy consumption cost calculation models under two types of task execution modes are respectively established, and a design method taking minimized system cost as a target is used, so that the minimum system time delay and energy consumption are generated in a plurality of continuous time slots.

Step 1, building a reinforcement learning model:

this step gives a detailed description of the specific implementation of the present invention using DQN for task offloading decisions. The method mainly comprises the definition of system states, actions, reward equations and the like in the Markov decision process.

1. Markov decision process construction

At the beginning of each slot, each mobile device observes its state (e.g., task size, queue length, etc. information). If there is a new task to process, the mobile device will select an appropriate offloading decision for the task, minimizing the long-term cost of its task computation. Applying deep reinforcement learning to the task unloading decision problem requires constructing a Markov decision process, and system states, actions and reward equations need to be specifically defined in the construction process.

2. System state setting

The first information relevant to the offloading decision is the nature of the task itself. First consider the task itself size λ (t). At the beginning of each slot, the mobile m needs to first observe the size of its own task, denoted with λ (t). If there is a new task k (t) at the beginning of the current timeslot, λ (t) ═ k (t), otherwise λ (t) ═ 0. Note that since the generation of the setting new task is also generated at the start of the slot, there is no problem that the task cannot calculate λ (t) generated in the slot.

One task characteristic that also needs to be considered is the maximum acceptable delay of the task, which is also relevant for the offloading decision. It is also considered to be added to the system state.

The execution time of the task in the three queuing queues, etc. is also relevant to the unloading decision. It should also be added to the system state. The method specifically comprises the following steps:

the number of time slots required to wait in the local calculation queue by the task is represented;

indicating the number of time slots for which the task needs to wait in the transmission queue;

representing the queue length of mobile m at edge node n.

The load level, i.e., the number of active queues at a node, is constantly changing due to each edge node. And the load level at the current moment is strongly correlated with the last moment. It is therefore contemplated to construct an edge node load level matrix m (t) to embody the load levels of the edge nodes over time.

Specifically, the matrix M (T) is used to represent the history of the load level (i.e., the number of active queues of the edge node, max. the number of mobile devices M) of each edge node in the first T time slots (from time slot T-T to time slot T-1). It is a matrix of dimension T × N, where T is the total number of slots and N is the number of edge nodes. For example, { M (t) }_(i，j)Indicating the number of active queues of edge node j at the T-T + i-1 th slot.

In summary of the above description, the system state observed by mobile device m at the current time slot can be defined as a vector of several dimensions, using s_m(t) represents, i.e.

3. System actions

The decision each mobile device needs to make is first to determine whether to offload to an edge server or to execute locally, and then needs to consider which edge server to offload to. The local computation is denoted by 0 and the sequence number of the offloaded edge server is denoted by k. Assuming a total of M edge servers, the system action can be expressed as a (t) {0,1, 2, …, M }.

4. Reward function

The most affecting the mobile device application experience is the latency and energy consumption resulting from the offloading. Therefore, the setting of the reward function of the invention is also constructed around the time delay and energy consumption generated in the task unloading process.

The delay caused by the task is considered from both the local computation and the edge computation.

If the task is decided to be calculated locally, the number of slots the task waits

The calculation can be as follows.

If the task is decided as edge calculation, the number of time slots for waiting for the task passes through the time after the edge node completes execution

Is calculated as

The energy consumption generated by the task is also considered from the two cases of local calculation and edge calculation.

The energy required for the task local computation is

Energy consumption in task edge calculation is mainly performed in two parts of task uploading and task execution, and the power of the mobile equipment is assumed to be p when the task is uploaded_upThe power of the mobile device when the task is executed is p_eThen for device i;

setting preference coefficients of the mobile user i for time delay and energy consumption respectively as

And

then the reward function of the user in the uninstallation decision process is set as

On the other hand, if a task is dropped because it has reached the maximum delay time it can accept, the reward function at this point is defined as a fixed penalty value P, i.e. the value P is a fixed penalty value

R＝P (5)

Step 2, constructing a neural network, as shown in fig. 1:

1. an input layer: this layer is responsible for taking the state as input and passing it to the following layers. For mobile m, the state information λ (t),

m (t) will be passed to the FC layer for prediction.

LSTM layer: the matrix m (T) represents the load level of each edge node in the first T time slots, and the load levels of the edge nodes are time-dependent, i.e. the load levels of the edge nodes have a time dependency. The temporal dependence of the predicted load level using the LSTM layer is therefore considered.

FC layer: the two FC layers are responsible for learning the mapping of states to action Q values. Each FC layer contains a set of neurons with rectifying linear cells.

4. And (3) an output layer: the value of the output layer corresponds to the Q value corresponding to the current state using the currently selected action. The method is used for reflecting the comprehensive cost brought by the current decision, namely the balance value of time delay and energy consumption.

Examples

In this embodiment, the system architecture applied in the present invention is introduced from three aspects, namely, a network model, a task model, and a task scheduling model.

Step 1, establishing a network model: the scenario addressed by the present invention consists of two parts, a number of base stations equipped with edge servers, and a number of mobile devices that need to perform intensive computational tasks. Each base station is equipped with an edge server with high computing power. Assuming that at the beginning of each timeslot, each mobile device generates an intensive task with a certain probability, the task is either executed locally or is completely offloaded to the edge server side for execution, and the task is not partitioned.

Considering a time period T of T slots {1,2,3, …, T }, the present invention sets T to 100 in the simulation, where the time length of each slot is set to Δ. For N mobiles in the cell, the offload decision variable α is used at the beginning of each timeslot_iThe decision variable a indicates whether to offload a computing task to an edge server for execution and among which servers, if it is chosen to offload it to the k-th edge server_iIf the task is chosen to be executed locally, the variable a is decided_i＝0。

It is assumed that the tasks of the mobile devices have the same priority. Each edge node has a CPU for processing tasks in the queue. At the beginning of each time slot, the processing power of the CPU at edge node n is shared on average by the task queues corresponding to several mobile devices at the edge node.

Step 2, establishing a task model: the representation of tasks generated by the mobile device has two dimensions, one is the size of the storage space of the task, where the size λ of the task is set_m(t) is a random value between 3Mbit and 5Mbit, with a step size of 0.1 Mbit. Another is the maximum number of waiting slots that a task can accept, using τ_mAnd (4) showing. When the waiting time of the task exceeds tau_mThis task is discarded in the system.

Step 3, establishing a task scheduling model: taking the edge server side as an example, since the task comes in order, the edge server may not finish the task of the previous time slot at the beginning stage of the next time slot, and thus the task is queued. Three queue models, namely a local calculation queue, a task transmission queue and an edge node calculation queue, are constructed as a task scheduling model.

Step 4, evaluating and analyzing algorithm performance:

and selecting a scene to research the convergence of the algorithm, setting the number of users to be 120, the number of edge servers to be 5, iterating the algorithm for 1200 times, and drawing the average reward value of the reward function of the algorithm after continuous 100 times of decision making. It can be seen from fig. 2 that the average prize value starts to converge when the number of iterations reaches around 500, and the average prize value gradually becomes stable in the following iterations. This indicates that the intelligent algorithm has learned a more stable unload strategy over multiple training sessions.

And selecting other three baseline algorithms to be compared with the DQN-based algorithm designed by the invention, setting the number of the mobile users to be changed from 50 to 120, setting the number of the fixed edge servers to be 5, and respectively drawing average reward value curves of the four algorithms as shown in FIG. 3, wherein reward values used by the three baseline algorithms are all obtained by averaging 100 times.

As can be seen from the simulation curve, the average reward values obtained by the three algorithms except all the local calculation algorithms show a descending trend along with the increase of the number of the users. This is because the number of edge servers set in the simulation is always 5, and the number of users is gradually increased, so that the server resources that can be allocated by the users are gradually strained. For all local offload algorithms, the average reward value is basically unchanged along with the change of the number of users, and when the number of users is less than 90, the performance of all offload algorithms is superior to that of all local calculation algorithms, because the server calculation resources are sufficient at this time, and as the number of users continues to increase, the performance of all offload algorithms is inferior to that of all local calculation algorithms.

As can be seen from the simulation curve of fig. 4, the DQN algorithm can always obtain lower task processing delay compared to the other three baseline algorithms. Similar to the trend of the reward value curve, when the number of users exceeds 90, the time delay of the full offload algorithm begins to exceed the time delay of all local computing algorithms due to the excessive strain of computing resources of the edge server.

The above-described calculation examples of the present invention are merely to explain the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims

1. The unloading decision method in the mobile edge calculation based on the deep Q learning is characterized by comprising the following steps:

step one, building a reinforcement learning model

step two, constructing a neural network

2. The method for offloading decision making in moving edge computation based on deep Q learning of claim 1, wherein the system state is constructed in a manner of:

Constructing a local calculation queue, a task transmission queue and an edge node calculation queue, and representing the number of time slots in the local calculation queue, which need to wait for the tasks of the mobile equipment m when the current time slot starts, as

Denote the queue length of mobile m at edge node n as

the system state s observed by mobile device m at the current time slot_m(t) is:

3. the method of claim 2, wherein the system action is represented by a (t) {0,1, 2, …, N }, where 0 represents local computation and k ═ 1,2, …, N, k represents a sequence number of an offloaded edge server.

4. The method for offloading decision in moving edge calculation based on deep Q learning of claim 3, wherein the reward function is constructed by:

Comprises the following steps:

wherein the content of the first and second substances,

energy required in task local computation

Comprises the following steps:

And

then the reward function for mobile user m in the offload decision process is:

Generated during local execution of tasksSum of time delays, E being total energy consumption by mobile user m, i.e.

5. The method for offloading decision in moving edge calculation based on deep Q learning of claim 3, wherein the reward function is constructed by:

if the task is decided as the edge calculation, the time slot number of the task waiting is determined by the time after the execution of the edge server n is completed

Is calculated as the number of time slots the task waits

The energy required in task edge computing comprises two parts of task uploading and task execution, and the power of the mobile equipment when the task is uploaded is represented as p_upThe power of the mobile device when the task is performed is denoted as p_eThen for mobile device m, the required energy

Comprises the following steps:

wherein, t_n,upRepresenting the time it takes for mobile device m to upload a task to edge server n, t_n,eRepresenting the time it takes for mobile device m to perform a task in edge server n.

6. The method for offloading decision making in moving edge computation based on deep Q learning of claim 3, wherein the reward function is constructed in a manner of:

7. The deep Q learning-based mobile edge computation offload decision method of claim 3, wherein the LSTM layer is used to predict temporal dependencies of edge server load levels according to a matrix m (t).

8. The method of claim 7, wherein the first FC layer and the second FC layer are used for learning a mapping of system states to system action reward function values, and each of the first FC layer and the second FC layer comprises a set of neurons with rectifying linear elements.

9. The method according to claim 8, wherein the output layer is configured to output a reward function value corresponding to a currently selected action taken by a current system state.

10. The system for offloading decision making in mobile edge computing based on deep Q learning is characterized in that the system is configured to perform the offloading decision making method in mobile edge computing based on deep Q learning according to any one of claims 1 to 9.