CN114706631A - Unloading decision method and system in mobile edge calculation based on deep Q learning - Google Patents

Unloading decision method and system in mobile edge calculation based on deep Q learning Download PDF

Info

Publication number
CN114706631A
CN114706631A CN202210427768.1A CN202210427768A CN114706631A CN 114706631 A CN114706631 A CN 114706631A CN 202210427768 A CN202210427768 A CN 202210427768A CN 114706631 A CN114706631 A CN 114706631A
Authority
CN
China
Prior art keywords
task
layer
edge
mobile
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210427768.1A
Other languages
Chinese (zh)
Other versions
CN114706631B (en
Inventor
杨柱天
朱伟强
杨蔚
佟令宇
杨佳敏
陈迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
8511 Research Institute of CASIC
Original Assignee
Harbin Institute of Technology
8511 Research Institute of CASIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology, 8511 Research Institute of CASIC filed Critical Harbin Institute of Technology
Priority to CN202210427768.1A priority Critical patent/CN114706631B/en
Publication of CN114706631A publication Critical patent/CN114706631A/en
Application granted granted Critical
Publication of CN114706631B publication Critical patent/CN114706631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44594Unloading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

An unloading decision method and system in mobile edge calculation based on deep Q learning belongs to the technical field of unloading decision of mobile equipment in a mobile edge calculation system. The invention solves the problems of large time delay and high energy consumption generated in the unloading decision process in the existing mobile edge computing system. The invention applies a deep reinforcement learning algorithm to the unloading decision problem in the mobile edge calculation, and designs the corresponding system state, action and reward equation according to the task scheduling models such as a local calculation queue, a task transmission queue, an edge server queue and the like established in the system. By comparing the average time delay and energy consumption of the method with those of other algorithms, the unloading decision method disclosed by the invention can be used for greatly reducing the time delay and energy consumption generated in the unloading decision process in the mobile edge computing system. The method can be applied to the unloading decision of the mobile equipment in the mobile edge computing system.

Description

Unloading decision method and system in mobile edge calculation based on deep Q learning
Technical Field
The invention belongs to the technical field of unloading decision of mobile equipment in a mobile edge computing system, and particularly relates to an unloading decision method and system in mobile edge computing based on deep Q learning.
Background
With the rapid development of 5G and Internet of things technologies, people have stepped into a new world of everything interconnection. In recent years, the number of mobile devices with networking functions, such as smart phones, smart home appliances, smart wearable devices, and the like, has increased in a blowout manner, and meanwhile, the emergence of new functions, such as virtual reality, real-time path planning, online video processing, and the like, has also made stricter requirements on the capabilities of data transmission and data calculation. How to find an effective way to solve the needs of the internet of things equipment for data transmission and data calculation is an urgent problem to be solved, and the mobile edge calculation becomes an effective solution.
Although the existing mobile edge computing method has achieved a certain achievement, the time delay generated in the unloading decision process in the existing mobile edge computing system is still large, and the energy consumption generated in the unloading decision process is still high, so that it is necessary to provide an unloading decision method for the mobile edge computing system to reduce the time delay and the energy consumption generated in the unloading decision process.
Disclosure of Invention
The invention aims to solve the problems of large time delay and high energy consumption in the unloading decision process in the conventional mobile edge computing system, and provides an unloading decision method and an unloading decision system in mobile edge computing based on deep Q learning.
The technical scheme adopted by the invention for solving the technical problems is as follows:
based on one aspect of the invention, the method for unloading decision in moving edge calculation based on deep Q learning specifically comprises the following steps:
step one, building a reinforcement learning model
Constructing a system state, a system action and a reward function in the Markov decision process according to the task characteristics;
step two, constructing a neural network
Constructing a neural network comprising an input layer, an LSTM layer, a first FC layer, a second FC layer and an output layer, wherein the input layer is used for transmitting system state information to the LSTM layer and the first FC layer and taking the output of the LSTM layer as the input of the first FC layer;
the output of the first FC layer is then used as the input to the second FC layer, and the output of the second FC layer is used as the input to the output layer.
Further, the system state is constructed in the following manner:
denote the self task size of the mobile device m at the beginning of the current slot as λm(t), if the new task k (t) exists in the mobile device m when the current time slot starts, lambdam(t) k (t), otherwise λm(t)=0;
Constructing a local calculation queue, a task transmission queue and an edge node calculation queue, and representing the number of time slots of the self task of the mobile equipment m needing to wait in the local calculation queue when the current time slot starts as the number of the time slots
Figure BDA0003610402240000021
The number of time slots that the self task of the mobile device m needs to wait in the task transmission queue when the current time slot starts is expressed as
Figure BDA0003610402240000022
Denote the queue length of mobile m at edge node n as
Figure BDA0003610402240000023
Constructing a matrix M (T) for representing the load level of each edge server in T time slots before the current time slot, wherein the dimensionality of the M (T) is T multiplied by N, and N is the number of the edge servers;
the system state s observed by mobile m at the current time slotm(t) is:
Figure BDA0003610402240000024
further, the system action is denoted as a (t) {0,1, 2, …, N }, where 0 denotes local computation and k ═ 1,2, …, N, k denotes the sequence number of the offloaded edge server.
Further, the construction mode of the reward function is as follows:
if the task is decided to be calculated locally, the number of time slots for which the task waits
Figure BDA0003610402240000025
Comprises the following steps:
Figure BDA0003610402240000026
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003610402240000027
indicating the time after the task generated at the time slot t' is executed locally;
energy required in task local computation
Figure BDA0003610402240000028
Comprises the following steps:
Figure BDA0003610402240000029
wherein epsilonmRepresenting the CPU's coefficient of energy consumption during the local calculation of the mobile device m, i.e. the energy consumed by the local CPU for a cycle, dmRepresenting the calculation amount of the currently generated task of the mobile device m, namely the number of CPU calculation cycles needed for executing the currently generated task;
setting preference coefficients of the mobile user m to time delay and energy consumption as
Figure BDA00036104022400000210
And
Figure BDA00036104022400000211
then the reward function for mobile user m in the offload decision process is:
Figure BDA0003610402240000031
wherein, R is the value of the reward function, T is the total time delay generated when the mobile user m is locally calculated, namely T is equal to the number of time slots of task queuing waiting
Figure BDA0003610402240000032
E is the total energy consumption generated by the mobile user m, i.e. the sum of the time delays generated during the local execution of the task
Figure BDA0003610402240000033
Further, the construction mode of the reward function is as follows:
if the task is decided as edge calculation, the number of time slots for waiting the task passes through the time after the edge server n finishes executing
Figure BDA0003610402240000034
Is calculated as the number of time slots the task waits
Figure BDA0003610402240000035
The energy required in task edge computing comprises two parts of task uploading and task execution, and the power of the mobile equipment when the task is uploaded is represented as pupThe power of the mobile device when a task is performed is denoted as peThen for mobile device m, the required energy
Figure BDA0003610402240000036
Comprises the following steps:
Figure BDA0003610402240000037
wherein, tn,upRepresenting the time it takes for mobile device m to upload a task to edge server n, tn,eRepresenting the time it takes for the mobile device m to perform a task in the edge server n.
At this time, the reward function of the user in the unloading decision process is as follows:
Figure BDA0003610402240000038
wherein R is the value of the reward function, and T is the total time delay generated by task queuing
Figure BDA0003610402240000039
Time delay t generated by uploading task to edge server nn,upAnd the time delay t generated by the execution of the task at the edge server nn,eE is the total energy consumption resulting from the edge calculation, i.e.
Figure BDA00036104022400000310
Further, the construction mode of the reward function is as follows:
if the maximum delay time allowed by the task is reached before the task is executed, the task is discarded, and the value R of the reward function at the moment is set as a fixed penalty value P.
Further, the LSTM layer is used to predict the temporal dependence of the edge server load level according to a matrix m (t).
Further, the first FC layer and the second FC layer are used for learning the mapping of the system state to the system action reward function value, and each of the first FC layer and the second FC layer comprises a group of neurons with rectifying linear units.
Further, the output layer is configured to output a value of the reward function corresponding to the currently selected action in the current system state.
According to another aspect of the invention, the unloading decision system in the mobile edge calculation based on the deep Q learning is used for executing the unloading decision method in the mobile edge calculation based on the deep Q learning.
The invention has the beneficial effects that:
the invention applies a deep reinforcement learning algorithm to the unloading decision problem in the mobile edge calculation, and designs the corresponding system state, action and reward equation according to the task scheduling models such as a local calculation queue, a task transmission queue, an edge server queue and the like established in the system. By comparing the average time delay and energy consumption of the method with those of other algorithms, the unloading decision method disclosed by the invention can be used for greatly reducing the time delay and energy consumption generated in the unloading decision process in the mobile edge computing system.
Drawings
FIG. 1 is a diagram of a neural network constructed in accordance with the present invention;
FIG. 2 is a graph of the convergence of the value of the reward function with the number of iterations of the method of the present invention;
FIG. 3 is a graph of average reward value versus number of users for the method of the present invention and three other baseline algorithms;
FIG. 4 is a graph of the average delay as a function of the number of users for the method of the present invention and three other baseline algorithms.
Detailed Description
In a first specific embodiment, the present embodiment provides a computation offloading strategy based on deep reinforcement learning for a network scenario with multiple mobile devices and multiple servers in an MEC system. Each mobile user is regarded as an intelligent agent, and tasks need to be queued due to the arrival sequence of the tasks in the unloading process. Then, time delay and energy consumption cost calculation models under two types of task execution modes are respectively established, and a design method taking minimized system cost as a target is used, so that the minimum system time delay and energy consumption are generated in a plurality of continuous time slots.
Step 1, building a reinforcement learning model:
this step gives a detailed description of the specific implementation of the present invention using DQN for task offloading decisions. The method mainly comprises the definition of system states, actions, reward equations and the like in the Markov decision process.
1. Markov decision process construction
At the beginning of each slot, each mobile device observes its state (e.g., task size, queue length, etc. information). If there is a new task to process, the mobile device will select an appropriate offloading decision for the task, minimizing the long-term cost of its task computation. Applying deep reinforcement learning to the task unloading decision problem requires constructing a Markov decision process, and system states, actions and reward equations need to be specifically defined in the construction process.
2. System state setting
The first information relevant to the offloading decision is the nature of the task itself. First consider the task itself size λ (t). At the beginning of each slot, the mobile m needs to first observe the size of its own task, denoted with λ (t). If there is a new task k (t) at the beginning of the current timeslot, λ (t) ═ k (t), otherwise λ (t) ═ 0. Note that since the generation of the setting new task is also generated at the start of the slot, there is no problem that the task cannot calculate λ (t) generated in the slot.
One task characteristic that also needs to be considered is the maximum acceptable delay of the task, which is also relevant for the offloading decision. It is also considered to be added to the system state.
The execution time of the task in the three queuing queues, etc. is also relevant to the unloading decision. It should also be added to the system state. The method specifically comprises the following steps:
Figure BDA0003610402240000051
the number of time slots required to wait in the local calculation queue by the task is represented;
Figure BDA0003610402240000052
indicating the number of time slots for which the task needs to wait in the transmission queue;
Figure BDA0003610402240000053
representing the queue length of mobile m at edge node n.
The load level, i.e., the number of active queues at a node, is constantly changing due to each edge node. And the load level at the current moment is strongly correlated with the last moment. It is therefore contemplated to construct an edge node load level matrix m (t) to embody the load levels of the edge nodes over time.
Specifically, the matrix M (T) is used to represent the history of the load level (i.e., the number of active queues of the edge node, max. the number of mobile devices M) of each edge node in the first T time slots (from time slot T-T to time slot T-1). It is a matrix of dimension T × N, where T is the total number of slots and N is the number of edge nodes. For example, { M (t) }(i,j)Indicating the number of active queues of edge node j at the T-T + i-1 th slot.
In summary of the above description, the system state observed by mobile device m at the current time slot can be defined as a vector of several dimensions, using sm(t) represents, i.e.
Figure BDA0003610402240000054
3. System actions
The decision each mobile device needs to make is first to determine whether to offload to an edge server or to execute locally, and then needs to consider which edge server to offload to. The local computation is denoted by 0 and the sequence number of the offloaded edge server is denoted by k. Assuming a total of M edge servers, the system action can be expressed as a (t) {0,1, 2, …, M }.
4. Reward function
The most affecting the mobile device application experience is the latency and energy consumption resulting from the offloading. Therefore, the setting of the reward function of the invention is also constructed around the time delay and energy consumption generated in the task unloading process.
The delay caused by the task is considered from both the local computation and the edge computation.
If the task is decided to be calculated locally, the number of slots the task waits
Figure BDA0003610402240000061
The calculation can be as follows.
Figure BDA0003610402240000062
If the task is decided as edge calculation, the number of time slots for waiting for the task passes through the time after the edge node completes execution
Figure BDA0003610402240000063
Is calculated as
Figure BDA0003610402240000064
The energy consumption generated by the task is also considered from the two cases of local calculation and edge calculation.
The energy required for the task local computation is
Figure BDA0003610402240000065
Energy consumption in task edge calculation is mainly performed in two parts of task uploading and task execution, and the power of the mobile equipment is assumed to be p when the task is uploadedupThe power of the mobile device when the task is executed is peThen for device i;
Figure BDA0003610402240000066
setting preference coefficients of the mobile user i for time delay and energy consumption respectively as
Figure BDA0003610402240000067
And
Figure BDA0003610402240000068
then the reward function of the user in the uninstallation decision process is set as
Figure BDA0003610402240000069
On the other hand, if a task is dropped because it has reached the maximum delay time it can accept, the reward function at this point is defined as a fixed penalty value P, i.e. the value P is a fixed penalty value
R=P (5)
Step 2, constructing a neural network, as shown in fig. 1:
1. an input layer: this layer is responsible for taking the state as input and passing it to the following layers. For mobile m, the state information λ (t),
Figure BDA0003610402240000071
m (t) will be passed to the FC layer for prediction.
LSTM layer: the matrix m (T) represents the load level of each edge node in the first T time slots, and the load levels of the edge nodes are time-dependent, i.e. the load levels of the edge nodes have a time dependency. The temporal dependence of the predicted load level using the LSTM layer is therefore considered.
FC layer: the two FC layers are responsible for learning the mapping of states to action Q values. Each FC layer contains a set of neurons with rectifying linear cells.
4. And (3) an output layer: the value of the output layer corresponds to the Q value corresponding to the current state using the currently selected action. The method is used for reflecting the comprehensive cost brought by the current decision, namely the balance value of time delay and energy consumption.
Examples
In this embodiment, the system architecture applied in the present invention is introduced from three aspects, namely, a network model, a task model, and a task scheduling model.
Step 1, establishing a network model: the scenario addressed by the present invention consists of two parts, a number of base stations equipped with edge servers, and a number of mobile devices that need to perform intensive computational tasks. Each base station is equipped with an edge server with high computing power. Assuming that at the beginning of each timeslot, each mobile device generates an intensive task with a certain probability, the task is either executed locally or is completely offloaded to the edge server side for execution, and the task is not partitioned.
Considering a time period T of T slots {1,2,3, …, T }, the present invention sets T to 100 in the simulation, where the time length of each slot is set to Δ. For N mobiles in the cell, the offload decision variable α is used at the beginning of each timeslotiThe decision variable a indicates whether to offload a computing task to an edge server for execution and among which servers, if it is chosen to offload it to the k-th edge serveriIf the task is chosen to be executed locally, the variable a is decidedi=0。
It is assumed that the tasks of the mobile devices have the same priority. Each edge node has a CPU for processing tasks in the queue. At the beginning of each time slot, the processing power of the CPU at edge node n is shared on average by the task queues corresponding to several mobile devices at the edge node.
Step 2, establishing a task model: the representation of tasks generated by the mobile device has two dimensions, one is the size of the storage space of the task, where the size λ of the task is setm(t) is a random value between 3Mbit and 5Mbit, with a step size of 0.1 Mbit. Another is the maximum number of waiting slots that a task can accept, using τmAnd (4) showing. When the waiting time of the task exceeds taumThis task is discarded in the system.
Step 3, establishing a task scheduling model: taking the edge server side as an example, since the task comes in order, the edge server may not finish the task of the previous time slot at the beginning stage of the next time slot, and thus the task is queued. Three queue models, namely a local calculation queue, a task transmission queue and an edge node calculation queue, are constructed as a task scheduling model.
Step 4, evaluating and analyzing algorithm performance:
and selecting a scene to research the convergence of the algorithm, setting the number of users to be 120, the number of edge servers to be 5, iterating the algorithm for 1200 times, and drawing the average reward value of the reward function of the algorithm after continuous 100 times of decision making. It can be seen from fig. 2 that the average prize value starts to converge when the number of iterations reaches around 500, and the average prize value gradually becomes stable in the following iterations. This indicates that the intelligent algorithm has learned a more stable unload strategy over multiple training sessions.
And selecting other three baseline algorithms to be compared with the DQN-based algorithm designed by the invention, setting the number of the mobile users to be changed from 50 to 120, setting the number of the fixed edge servers to be 5, and respectively drawing average reward value curves of the four algorithms as shown in FIG. 3, wherein reward values used by the three baseline algorithms are all obtained by averaging 100 times.
As can be seen from the simulation curve, the average reward values obtained by the three algorithms except all the local calculation algorithms show a descending trend along with the increase of the number of the users. This is because the number of edge servers set in the simulation is always 5, and the number of users is gradually increased, so that the server resources that can be allocated by the users are gradually strained. For all local offload algorithms, the average reward value is basically unchanged along with the change of the number of users, and when the number of users is less than 90, the performance of all offload algorithms is superior to that of all local calculation algorithms, because the server calculation resources are sufficient at this time, and as the number of users continues to increase, the performance of all offload algorithms is inferior to that of all local calculation algorithms.
As can be seen from the simulation curve of fig. 4, the DQN algorithm can always obtain lower task processing delay compared to the other three baseline algorithms. Similar to the trend of the reward value curve, when the number of users exceeds 90, the time delay of the full offload algorithm begins to exceed the time delay of all local computing algorithms due to the excessive strain of computing resources of the edge server.
The above-described calculation examples of the present invention are merely to explain the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims (10)

1. The unloading decision method in the mobile edge calculation based on the deep Q learning is characterized by comprising the following steps:
step one, building a reinforcement learning model
Constructing a system state, a system action and a reward function in the Markov decision process according to the task characteristics;
step two, constructing a neural network
Constructing a neural network comprising an input layer, an LSTM layer, a first FC layer, a second FC layer and an output layer, wherein the input layer is used for transmitting system state information to the LSTM layer and the first FC layer and taking the output of the LSTM layer as the input of the first FC layer;
the output of the first FC layer is then used as the input to the second FC layer, and the output of the second FC layer is used as the input to the output layer.
2. The method for offloading decision making in moving edge computation based on deep Q learning of claim 1, wherein the system state is constructed in a manner of:
denote the self task size of the mobile device m at the beginning of the current slot as λm(t), if the new task k (t) exists in the mobile device m when the current time slot starts, lambdam(t) k (t), otherwise λm(t)=0;
Constructing a local calculation queue, a task transmission queue and an edge node calculation queue, and representing the number of time slots in the local calculation queue, which need to wait for the tasks of the mobile equipment m when the current time slot starts, as
Figure FDA0003610402230000011
The number of time slots that the self task of the mobile device m needs to wait in the task transmission queue when the current time slot starts is expressed as
Figure FDA0003610402230000012
Denote the queue length of mobile m at edge node n as
Figure FDA0003610402230000013
Constructing a matrix M (T) for representing the load level of each edge server in T time slots before the current time slot, wherein the dimensionality of the M (T) is T multiplied by N, and N is the number of the edge servers;
the system state s observed by mobile device m at the current time slotm(t) is:
Figure FDA0003610402230000014
3. the method of claim 2, wherein the system action is represented by a (t) {0,1, 2, …, N }, where 0 represents local computation and k ═ 1,2, …, N, k represents a sequence number of an offloaded edge server.
4. The method for offloading decision in moving edge calculation based on deep Q learning of claim 3, wherein the reward function is constructed by:
if the task is decided to be calculated locally, the number of time slots for which the task waits
Figure FDA0003610402230000015
Comprises the following steps:
Figure FDA0003610402230000016
wherein the content of the first and second substances,
Figure FDA0003610402230000017
indicating the time after the task generated at the time slot t' is executed locally;
energy required in task local computation
Figure FDA0003610402230000021
Comprises the following steps:
Figure FDA0003610402230000022
wherein epsilonmRepresenting the CPU's coefficient of energy consumption during the local calculation of the mobile device m, i.e. the energy consumed by the local CPU for a cycle, dmRepresenting the calculation amount of the currently generated task of the mobile device m, namely the number of CPU calculation cycles needed for executing the currently generated task;
setting preference coefficients of the mobile user m to time delay and energy consumption as
Figure FDA0003610402230000023
And
Figure FDA0003610402230000024
then the reward function for mobile user m in the offload decision process is:
Figure FDA0003610402230000025
wherein, R is the value of the reward function, T is the total time delay generated when the mobile user m is locally calculated, namely T is equal to the number of time slots of task queuing waiting
Figure FDA0003610402230000026
Generated during local execution of tasksSum of time delays, E being total energy consumption by mobile user m, i.e.
Figure FDA0003610402230000027
5. The method for offloading decision in moving edge calculation based on deep Q learning of claim 3, wherein the reward function is constructed by:
if the task is decided as the edge calculation, the time slot number of the task waiting is determined by the time after the execution of the edge server n is completed
Figure FDA0003610402230000028
Is calculated as the number of time slots the task waits
Figure FDA0003610402230000029
The energy required in task edge computing comprises two parts of task uploading and task execution, and the power of the mobile equipment when the task is uploaded is represented as pupThe power of the mobile device when the task is performed is denoted as peThen for mobile device m, the required energy
Figure FDA00036104022300000210
Comprises the following steps:
Figure FDA00036104022300000211
wherein, tn,upRepresenting the time it takes for mobile device m to upload a task to edge server n, tn,eRepresenting the time it takes for mobile device m to perform a task in edge server n.
At this time, the reward function of the user in the unloading decision process is as follows:
Figure FDA00036104022300000212
wherein R is the value of the reward function, and T is the total time delay generated by task queuing
Figure FDA00036104022300000213
Time delay t generated by uploading task to edge server nn,upAnd the time delay t generated by the execution of the task at the edge server nn,eE is the total energy consumption resulting from the edge calculation, i.e.
Figure FDA0003610402230000031
6. The method for offloading decision making in moving edge computation based on deep Q learning of claim 3, wherein the reward function is constructed in a manner of:
if the maximum delay time allowed by the task is reached before the task is executed, the task is discarded, and the value R of the reward function at the moment is set as a fixed penalty value P.
7. The deep Q learning-based mobile edge computation offload decision method of claim 3, wherein the LSTM layer is used to predict temporal dependencies of edge server load levels according to a matrix m (t).
8. The method of claim 7, wherein the first FC layer and the second FC layer are used for learning a mapping of system states to system action reward function values, and each of the first FC layer and the second FC layer comprises a set of neurons with rectifying linear elements.
9. The method according to claim 8, wherein the output layer is configured to output a reward function value corresponding to a currently selected action taken by a current system state.
10. The system for offloading decision making in mobile edge computing based on deep Q learning is characterized in that the system is configured to perform the offloading decision making method in mobile edge computing based on deep Q learning according to any one of claims 1 to 9.
CN202210427768.1A 2022-04-22 2022-04-22 Unloading decision method and system in mobile edge calculation based on deep Q learning Active CN114706631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210427768.1A CN114706631B (en) 2022-04-22 2022-04-22 Unloading decision method and system in mobile edge calculation based on deep Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210427768.1A CN114706631B (en) 2022-04-22 2022-04-22 Unloading decision method and system in mobile edge calculation based on deep Q learning

Publications (2)

Publication Number Publication Date
CN114706631A true CN114706631A (en) 2022-07-05
CN114706631B CN114706631B (en) 2022-10-25

Family

ID=82175067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210427768.1A Active CN114706631B (en) 2022-04-22 2022-04-22 Unloading decision method and system in mobile edge calculation based on deep Q learning

Country Status (1)

Country Link
CN (1) CN114706631B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766241A (en) * 2022-11-21 2023-03-07 西安工程大学 Distributed intrusion detection system task scheduling and unloading method based on DQN algorithm
CN116909717A (en) * 2023-09-12 2023-10-20 国能(北京)商务网络有限公司 Task scheduling method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666149A (en) * 2020-05-06 2020-09-15 西北工业大学 Ultra-dense edge computing network mobility management method based on deep reinforcement learning
CN112616152A (en) * 2020-12-08 2021-04-06 重庆邮电大学 Independent learning-based mobile edge computing task unloading method
CN113067873A (en) * 2021-03-19 2021-07-02 北京邮电大学 Edge cloud collaborative optimization method based on deep reinforcement learning
CN113612843A (en) * 2021-08-02 2021-11-05 吉林大学 MEC task unloading and resource allocation method based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666149A (en) * 2020-05-06 2020-09-15 西北工业大学 Ultra-dense edge computing network mobility management method based on deep reinforcement learning
CN112616152A (en) * 2020-12-08 2021-04-06 重庆邮电大学 Independent learning-based mobile edge computing task unloading method
CN113067873A (en) * 2021-03-19 2021-07-02 北京邮电大学 Edge cloud collaborative optimization method based on deep reinforcement learning
CN113612843A (en) * 2021-08-02 2021-11-05 吉林大学 MEC task unloading and resource allocation method based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张文献 等: "面向多用户移动边缘计算轻量任务卸载优化", 《小型微型计算机系统》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766241A (en) * 2022-11-21 2023-03-07 西安工程大学 Distributed intrusion detection system task scheduling and unloading method based on DQN algorithm
CN116909717A (en) * 2023-09-12 2023-10-20 国能(北京)商务网络有限公司 Task scheduling method
CN116909717B (en) * 2023-09-12 2023-12-05 国能(北京)商务网络有限公司 Task scheduling method

Also Published As

Publication number Publication date
CN114706631B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN108920280B (en) Mobile edge computing task unloading method under single-user scene
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN114706631B (en) Unloading decision method and system in mobile edge calculation based on deep Q learning
CN112380008B (en) Multi-user fine-grained task unloading scheduling method for mobile edge computing application
CN113612843A (en) MEC task unloading and resource allocation method based on deep reinforcement learning
CN112860350A (en) Task cache-based computation unloading method in edge computation
CN111930436A (en) Random task queuing and unloading optimization method based on edge calculation
CN113220356B (en) User computing task unloading method in mobile edge computing
CN113434212A (en) Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
CN113778648A (en) Task scheduling method based on deep reinforcement learning in hierarchical edge computing environment
CN113867843B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN114567895A (en) Method for realizing intelligent cooperation strategy of MEC server cluster
CN116489712B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN113626104A (en) Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN113573363A (en) MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN111930435A (en) Task unloading decision method based on PD-BPSO technology
CN114942799B (en) Workflow scheduling method based on reinforcement learning in cloud edge environment
CN111148155A (en) Task unloading method based on mobile edge calculation
CN115858048A (en) Hybrid key level task oriented dynamic edge arrival unloading method
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
CN115413044A (en) Computing and communication resource joint distribution method for industrial wireless network
CN113747504A (en) Method and system for multi-access edge computing combined task unloading and resource allocation
Luo et al. Adaptive video streaming in software-defined mobile networks: A deep reinforcement learning approach
Vo et al. Reinforcement-Learning-Based Deadline Constrained Task Offloading Schema for Energy Saving in Vehicular Edge Computing System
CN117793805B (en) Dynamic user random access mobile edge computing resource allocation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant