CN112905315A - Task processing method, device and equipment in Mobile Edge Computing (MEC) network - Google Patents

Task processing method, device and equipment in Mobile Edge Computing (MEC) network Download PDF

Info

Publication number
CN112905315A
CN112905315A CN202110125013.1A CN202110125013A CN112905315A CN 112905315 A CN112905315 A CN 112905315A CN 202110125013 A CN202110125013 A CN 202110125013A CN 112905315 A CN112905315 A CN 112905315A
Authority
CN
China
Prior art keywords
task
state information
processed
mobile device
mobile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110125013.1A
Other languages
Chinese (zh)
Inventor
王冬宇
田心乔
王思野
崔浩然
李琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110125013.1A priority Critical patent/CN112905315A/en
Publication of CN112905315A publication Critical patent/CN112905315A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Abstract

The embodiment of the invention provides a task processing method, a task processing device and a task processing device in a Mobile Edge Computing (MEC) network, which are applied to the technical field of communication and can determine current state information; searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration; and selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so that each mobile device processes the task to be processed according to the target action. The mobile equipment can process the task according to the action corresponding to the maximum system benefit, and therefore the system benefit of the system where the mobile equipment is located can be improved.

Description

Task processing method, device and equipment in Mobile Edge Computing (MEC) network
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method, an apparatus, and a device for processing a task in a mobile edge computing MEC network.
Background
In recent years, applications of intelligent terminals in mobile networks have become more and more common, and various new applications such as Virtual Reality (VR)/Augmented Reality (AR), image recognition, biometric feature recognition, and the like have emerged. These applications are often resource intensive, i.e., run-time consuming large computing resources, with high quality of service requirements. Although the performance of the processor of the smart terminal is continuously improved, it is still difficult to meet the requirement of processing high-performance applications in a short time, which seriously affects the quality of service provided by the smart terminal to the user. Therefore, how to expand the resources of the intelligent terminal to meet the requirement of executing the high-performance task is a problem to be solved urgently at present.
Cloud computing provides an economical and efficient solution for mass data storage and processing. Mobile Cloud Computing (MCC) allows Mobile application tasks to run in remote data centers by means of a high-speed and reliable wireless interface. However, the delay overhead caused by long distance propagation is large, which results in the MCC architecture not being suitable for solving the current delay-sensitive task. To solve the above problem, Mobile Edge Computing (MEC) has come to work. The MEC technology deploys computing and storage resources at the edge of the network to improve the computing capacity of the mobile network and establish a low-delay and high-bandwidth network service solution. Compared to MCC, MEC circumvents privacy, security issues of mobile applications due to long distance transmission, such as high concentration of information in the platform, vulnerability, and private data leakage and loss due to separation of ownership and ownership of user data.
For an intelligent terminal in a mobile network, that is, a mobile device, the mobile device may locally process a task to be processed existing in the mobile device itself, or may offload the task to be processed existing in the mobile device itself to an edge computing server, and the task to be processed existing in the mobile device itself is processed by the edge computing server. The local processing occupies the resources of the mobile device, and the like, and the processing by the edge computing server causes the consumption of time cost, and the like, i.e. different processing modes bring different benefits and costs, and thus, determining which mode to process the task is important in the task processing process.
Disclosure of Invention
Embodiments of the present invention provide a method, an apparatus, and a device for processing a task in a mobile edge computing MEC network, so as to enable a mobile device to process the task according to an action corresponding to a maximum system benefit, thereby improving the system benefit of a system in which the mobile device is located. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for processing a task in a mobile edge computing MEC network, including:
determining current state information, wherein the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;
searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;
and selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so that each mobile device processes the task to be processed according to the target action.
In a second aspect, an embodiment of the present invention provides a task processing device in a mobile edge computing MEC network, including:
the system comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for determining current state information, and the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;
the searching module is used for searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;
and the selection module is used for selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so as to enable each mobile device to process the task to be processed according to the target action.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of the first aspect when executing the program stored in the memory.
The task processing method, the device and the equipment in the MEC network provided by the embodiment of the invention can determine the current state information; searching the action and the system income corresponding to the current state information from a pre-established Q table; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration; and selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so that each mobile device processes the task to be processed according to the target action. The action corresponding to the maximum system benefit is selected as the target action corresponding to the current state information by searching the action and the system benefit corresponding to the current state information in the pre-established Q table, so that each mobile device processes the task to be processed according to the target action, the mobile device can process the task according to the action corresponding to the maximum system benefit, and the system benefit of the system where the mobile device is located can be improved. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a flowchart of a task processing method in an MEC network according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a dual-layer cellular network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating interaction between an agent and an environment in reinforcement learning according to an embodiment of the present invention;
FIG. 4 is a graph of comparative analysis of convergence of RLBA and GABA in the practice of the present invention;
FIG. 5 is a graph illustrating the total benefits for different ME amounts;
FIG. 6 is a diagram of the total benefit for different MEC server computing resources;
FIG. 7 is a diagram illustrating the total system benefits for different migration costs;
fig. 8 is a schematic structural diagram of a task processing device in an MEC network according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the present invention, the location variability of the Mobile Equipment (ME) is taken into account, and if the equipment leaves the coverage area of the microcellular network (SCN) during the movement, the migration between base stations is required for calculating the unloading result, thereby adding extra cost to the system. The invention can be applied in a dual-layer cellular network architecture that takes into account location variability of the mobile device; aiming at a time delay and power consumption model of a device processing task, a system total benefit function, namely an expected utility function, is constructed; describing a mixed integer nonlinear programming problem of the total benefit function of the system as a Markov decision process, and providing an optimization framework based on reinforcement learning to replace a traditional optimization method; and solving the optimization problem by adopting a classical Q-learning algorithm in reinforcement learning. The invention can obviously improve the total benefit of the system, namely the system formed by the mobile equipment in the double-layer cellular network architecture, wherein the total benefit refers to the benefit obtained by the system due to the reduction of time delay and energy consumption.
The following describes in detail a task processing method in a mobile edge computing MEC network provided by an embodiment of the present invention.
The task processing method in the mobile edge computing MEC network provided by the embodiment of the invention can comprise the following steps:
determining current state information, wherein the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;
searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;
and selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so that each mobile device processes the task to be processed according to the target action.
In the embodiment of the invention, the action corresponding to the maximum system benefit and the system benefit are searched in the pre-established Q table, and the action corresponding to the maximum system benefit is selected as the target action corresponding to the current state information, so that each mobile device processes the task to be processed according to the target action, and the mobile device can process the task according to the action corresponding to the maximum system benefit, thereby improving the system benefit of the system where the mobile device is located.
Fig. 1 is a flowchart of a task processing method in a mobile edge computing MEC network according to an embodiment of the present invention, and details of the task processing method in the mobile edge computing MEC network according to the embodiment of the present invention are described with reference to fig. 1.
S101, determining current state information, wherein the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed.
The system may be a dual-layer cellular network comprising: the mobile edge computing MEC system comprises a macro base station and a plurality of micro base stations, wherein each micro base station is respectively located at the center of a micro cellular network SCN, a mobile edge computing MEC server is deployed in the micro cellular network SCN where each micro base station is respectively located, and a plurality of mobile devices are randomly distributed in the coverage range of the plurality of SCNs.
The task processing method in the mobile edge computing MEC network provided by the embodiment of the invention can be executed by a base station, such as a macro base station. The macro base station may obtain the device information of each mobile device in the system, and then may perform statistics on the device information of the mobile devices existing in the system to obtain current state information corresponding to the system.
The task characteristic information can comprise the data volume of the task to be processed, the computing resource volume required for completing the task to be processed and the maximum time delay required to be met; the mobile profile information includes a stay time of the mobile device in a service coverage area, which is an area covered by the microcellular network SCN in which the mobile device is currently located.
For example, the status information may be expressed as S ═ Sn,Sd,Sm]Wherein S isnIndicating the number of MEs that have tasks to process in the two-layer cellular network; sdThe task characteristic information of the task to be processed is expressed, and the characteristic of the task to be processed can be expressed as
Figure BDA0002923684480000061
Refers to the size of MEC computing resources required to be allocated by the task to be processed; smRepresenting the mobility characteristics information of the mobile equipment, also understood as the mobility of the ME, may be represented as
Figure BDA0002923684480000062
For demonstrating the movement characteristics of the device.
S102, searching the action and the system benefit corresponding to the current state information from a pre-established Q table.
The actions include each mobile device processing its own existing pending tasks locally or through an edge server.
The pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions according to a plurality of state information and performing reinforcement learning iteration.
The system benefit may represent the latency and energy consumption of the mobile device processing tasks in the system.
The step of establishing the Q table may include:
determining state information corresponding to a historical moment, wherein the state information corresponding to the historical moment comprises the number of mobile devices with tasks to be processed in a system at the moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;
and step 1, selecting an action corresponding to the state information according to a preset strategy aiming at the state information.
The preset strategy may comprise a greedy algorithm strategy. The greedy algorithm strategy is strong in universality and easy to implement.
And 2, determining expected system benefits corresponding to the action executed under the state information.
In an alternative embodiment, step 2 may comprise:
by expected utility function
Figure BDA0002923684480000063
A desired system benefit is determined.
Wherein the content of the first and second substances,
Figure BDA0002923684480000064
in order to be a first utility function,
Figure BDA0002923684480000065
for the probability corresponding to the first utility function,
Figure BDA0002923684480000066
in order to be a function of the second utility function,
Figure BDA0002923684480000067
for the probability corresponding to the second utility function,
Figure BDA0002923684480000068
is the number of mobile devices in the system for which there are pending tasks.
Step 3, updating the expected system income corresponding to the state information and the action in the Q table to be established based on the determined expected system income;
respectively determining state information of a plurality of historical moments, and repeatedly executing the steps 1 to 3 until expected system benefits for each state information and each action are converged to obtain an established Q table; and aiming at each piece of state information, the Q table comprises expected system benefits corresponding to the combination of each piece of state information and each action.
In an alternative embodiment, the updating, in step 3, the expected system benefit corresponding to the state information and the action in the Q table to be created based on the determined expected system benefit and the reward value may include:
by preset formulas
Figure BDA0002923684480000071
And updating the state information in the Q table to be established and the expected system benefit corresponding to the action.
Wherein NewQ (s, a) is the updated expected system gain, Q (s, a) is the expected system gain obtained in the previous iteration, Q (s ', a') is the expected system gain determined in the current iteration, alpha and gamma are preset parameters, alpha is greater than or equal to 0, gamma is less than or equal to 1, and r is an award value.
S103, selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information, so that each mobile device processes the task to be processed according to the target action.
The Q table corresponds to the Q value obtained by each state-action pair, so that the selection which enables the system to have the highest profit can be made according to the Q table, and the maximum system benefit can be obtained according to the Q value.
Fig. 2 is a schematic structural diagram of a dual-layer cellular network according to an embodiment of the present invention. Referring to fig. 2, a dual-layer cellular network having Macro Base Stations (MBS) and a number of micro cellular networks (SCN), each equipped with a micro base station (SBS). The MEC server is deployed in a microcellular network center attached to the SBS, and the MEs are randomly distributed in the service area. As the MEs move, they are not stationary, but may leave the currently served unit. In addition, due to the differences of different MEs, the states of MEs in the network are also very different. Some MEs (e.g. MEs)1) The time of task computation is less than the time covered by the current SCN, and some other MEs (e.g., MEs)2) It will move to another unit before the task completes processing. For the latter, the calculation result cannot be directly sent to the ME, and needs to be transmitted to the target unit by means of macro base station relay, i.e. a task migration process is generated, which results in additional migration into a new taskThe method is as follows.
Under the system structure of the dual-layer cellular network, the aggregation of the ME in the dual-layer cellular network is assumed to be
Figure BDA0002923684480000072
And sets its number set as
Figure BDA0002923684480000073
ME1Can be described as
Figure BDA0002923684480000081
Wherein HiIndicating the size of the calculation data, which can also be understood as the size of the task, DiIndicating the size of the computing resource requested by the task,
Figure BDA0002923684480000082
representing the maximum delay of the task. Wherein D isiBy measuring the number of CPU revolutions, D is satisfiedi=εHiWhere epsilon represents a scaling factor of the size of the computing resources required for the task to the size of the computing data.
To describe the different model functions of the system architecture more clearly, it can be decomposed into three subsystems, a communication model, a mobility model and an offload model.
And (3) communication model:
suppose MEiHas a transmission power of a constant piAnd using diRepresents MEiDistance to SBS, hiRepresents MEiChannel gain to SBS. SNR of system, namely double-layer cellular networkiThe calculation formula of (2) is as follows:
Figure BDA0002923684480000083
where θ represents the standard path loss propagation index, σ2Representing the power of additive white gaussian noise.
Therefore, the transmission rate R of task uploading can be obtainediThe calculation formula of (2) is as follows:
Ri=Blog2(1+SNRi)
wherein, B is the communication bandwidth of ME and SBS.
Mobility model:
since the SBS is located in the center of the microcell, once the ME leaves the currently serving area, diRadius R exceeding SCNsTask migration due to microcell handover may occur. On the contrary, if d is always satisfiedi≤RsIt means that ME does not leave the area until the task is completed. Thus, d isiThe duration before the limit is exceeded is recorded in a period of time, which may also be referred to as the unit dwell time of the ME.
The cell dwell time is the time that a mobile user stays in a given cell and is an important performance indicator for planning network resources and improving QoS. The probability of migration of an ME can be measured by using an exponential function to represent the dwell time. Therefore, the probability density function of the residence time t
Figure BDA0002923684480000091
Can be expressed as:
Figure BDA0002923684480000092
wherein, tauiRefer to MEiThe average residence time of (a) is different for different MEs. Parameter tauiObeying Gaussian distribution, reliable parameter tau can be obtained by collecting historical data of MEi
Unloading the model:
when the ME needs compute intensive tasks, offloading such tasks can not only speed up the process, but also reduce the runtime and energy consumption of the device. In embodiments of the present invention, the tasks may be computed locally by the mobile device or processed by the MEC server on the SBS side. Can be combined with
Figure BDA0002923684480000093
As decision variables for task offloading, e.g. ai1 denotes task offload to MEC server, ai0 means that the task is executed locally, and therefore,
Figure BDA0002923684480000094
is the offload decision set for ME.
The unloading model can be refined into a local computation unloading model and an edge computation unloading model.
Local computation offload model: if a isi0, the ME performs local computations based on its own computing power, where the local computing power is set to
Figure BDA0002923684480000098
Then MEiCalculated time of execution
Figure BDA0002923684480000095
Expressed as:
Figure BDA0002923684480000096
and the energy consumption brought by the local computation can be expressed as:
Figure BDA0002923684480000097
where κ is the coefficient related to the switched capacitance.
Edge calculation unloading model: if a isiThe task is offloaded to the edge node over the wireless channel between the ME and the SBS, when the task is computed by the MEC server connected to the SBS. The process of offloading tasks to the MEC server can be viewed as both transport and execution. Recording the transmission time of the task-related data uploaded to the MEC server
Figure BDA0002923684480000101
Compared with the data needing to be uploaded, the result obtained by calculation is notOften so small that the transfer time for download or migration is negligible. From the ME perspective, the power consumption of task computation and result transmission need not be considered, and therefore energy consumption (i.e., energy consumption)
Figure BDA0002923684480000102
Comprises the following steps:
Figure BDA0002923684480000103
by passing
Figure BDA0002923684480000104
Set representing the allocation of computing resources of the MEC server, fiIdentity assignment to MEiThe edge computing resources of (1). Time of MEC server processing task
Figure BDA0002923684480000105
Can be expressed as
Figure BDA0002923684480000106
Maximum capacity C calculated due to MEC serverMECNot infinite, so part of the task may not be offloaded, and the resource allocation must satisfy the constraint condition
Figure BDA0002923684480000107
Therefore, the calculation formula of the time delay in this case is:
Figure BDA0002923684480000108
and based on the calculation task parameters of the ME, the mobile characteristics of the ME and the calculation capacity of the ME and the MEC server, providing a system total benefit function and constructing an optimization problem model according to a task unloading mode comprising local calculation and edge calculation.
The benefit function of the offload resource optimization problem may include both revenue and cost components. The revenue consists of two parts: time savings and local power consumption savings. The following benefits and costs are described in conjunction with the computational formulas in the system architecture described above.
Time savings refers to the time saved by the user due to selecting an edge calculation
Figure BDA0002923684480000109
Can be expressed as:
Figure BDA00029236844800001010
wherein, thetatTo save time.
Reduced local power consumption
Figure BDA00029236844800001011
Can be expressed as the formula:
Figure BDA00029236844800001012
wherein, thetaEThe profit coefficient for saving the local power consumption, namely the price per unit energy consumption.
Meanwhile, cost sources include both ME selective edge computation mode and possible task migration. The resource cost of selecting the offload mode includes energy consumption of data to be transmitted and MEC server execution resources to be allocated. The total cost is therefore:
Figure BDA0002923684480000111
wherein, thetafThe price per unit of resources is enforced for the MEC server to be allocated.
According to the time length relation between the edge calculation time and the estimated stay time, utility functions of the edge calculation mode under the following two situations can be included.
1) The tasks of the ME are successfully completed on the MEC server before leaving the current microcell. This situation can be described as the entire task having been offloadedThe time of course is less than the dwell time of the ME, e.g. ME in FIG. 21. That is, no task migration occurs at this time. The probability of this case can be described as the formula of the calculation of the probability density function of the residence time
Figure BDA0002923684480000112
Thus, the utility function of the corresponding part is:
Figure BDA0002923684480000113
2) due to the mobile nature of the ME, another situation exists on the MEC server where the ME stays in the previous microcell for a short time, such as the ME in fig. 22Leave the corresponding microcell before the task is completed with a probability of
Figure BDA0002923684480000114
The execution result cannot be directly returned, and must be transferred to the new SCN where the ME is located through the macro base station, and then the result is transferred to the ME. This transfer process will incur additional migration costs
Figure BDA0002923684480000115
And calculating data HiIs related to the size of (1) is set as
Figure BDA0002923684480000116
Where δ represents a scaling factor of migration cost to calculated data. Thus, the utility function of the corresponding part is noted as:
Figure BDA0002923684480000117
bonding of
Figure BDA0002923684480000118
And
Figure BDA0002923684480000119
the expression of the utility function obtained by the calculation formula of (2) is:
Figure BDA0002923684480000121
according to the mobility model, the corresponding probability is calculated as:
Figure BDA0002923684480000122
Figure BDA0002923684480000123
can be influenced by desired effects
Figure BDA0002923684480000124
To describe MEiCan be expressed as:
Figure BDA0002923684480000125
wherein, in aiSet when equal to 0
Figure BDA0002923684480000126
Since the ME will not generate revenue from the MEC server when computing the task locally.
The joint optimization problem regarding computation offloading and resource allocation in MEC networks considering mobile device location variability proposed by the embodiments of the present invention aims to maximize the long-term benefit values of all MEs. Considering the delay deadline for completing the task and the MEC server computing resource load, the corresponding constrained optimization problem can be expressed as follows:
Figure BDA0002923684480000127
s.t.C1:
Figure BDA0002923684480000128
to ensure that each ME selects either local or edge computation;
C2:
Figure BDA0002923684480000129
ensuring that the time of edge calculation must be positive and not exceed the deadline of task delay;
C3:
Figure BDA00029236844800001210
to ensure that MEC server computing resources allocated to each ME are non-negative;
C4:
Figure BDA00029236844800001211
the allocated computing resources are guaranteed not to exceed the total amount of MEC server capacity.
Specifically, determining the expected utility function may include:
and step A, determining time saving, power consumption saving and total cost.
Wherein the time savings is the time saved in offloading the pending task from the mobile device to the edge computing server as compared to the mobile device itself processing the pending task; the power saving is the local power saving of offloading the pending task from the mobile device to the MEC server compared to the mobile device processing the pending task; the total cost is the power consumption and resources required by the MEC server to process the pending task.
By the formula
Figure BDA0002923684480000131
The time saving is determined.
Wherein the content of the first and second substances,
Figure BDA0002923684480000132
to save time, θtIn order to save the time of the coefficient of return,
Figure BDA0002923684480000133
the time required for the mobile device to handle the pending task itself,
Figure BDA0002923684480000134
time consumed to offload pending tasks from the mobile device to the edge computing server.
By the formula
Figure BDA0002923684480000135
It is determined to save power consumption.
Wherein the content of the first and second substances,
Figure BDA0002923684480000136
to save power consumption, θEIn order to save the revenue factor for local power consumption,
Figure BDA0002923684480000137
the energy consumption resulting from the handling of the pending tasks by the mobile device itself.
By the formula
Figure BDA0002923684480000138
The total cost is determined.
Wherein the content of the first and second substances,
Figure BDA0002923684480000139
for the sake of the total cost,
Figure BDA00029236844800001310
energy consumption, θ, for MEC server processingfPrice per unit of execution resources for allocated MEC servers, fiTo allocate to MEiThe edge computing resources of (1).
And step B, determining the migration cost.
The migration cost represents the cost generated in the transmission process required by the mobile equipment to move from the current micro-cellular network SCN to another SCN before the MEC server completes the to-be-processed task, and the transmission process represents the process that the MEC server forwards the processing result obtained by processing the to-be-processed task to another SCN to which the mobile equipment moves through the macro base station and sends the processing result to the mobile equipment through the other SCN;
and C, calculating the difference value of the sum of the time saving and the power saving and the total cost, and taking the difference value as a first utility function.
The first utility function represents a utility function corresponding to a task that the MEC server has already processed and completed before the mobile device moves from the current microcellular network SCN to another SCN.
By the formula
Figure BDA00029236844800001311
Determining a first utility function
Figure BDA00029236844800001312
And D, calculating the difference value between the sum of the time saving and the power saving and the sum of the total cost and the migration cost, and taking the difference value as a second effective function.
The second utility function represents a utility function corresponding to the mobile device moving from the current microcellular network SCN to another SCN before the MEC server completes the task to be processed.
By the formula
Figure BDA0002923684480000141
Determining a second utility function
Figure BDA0002923684480000143
Wherein the content of the first and second substances,
Figure BDA0002923684480000142
is the migration cost.
And E, determining the first probability and the second probability.
The first probability represents the probability that the MEC server has processed the pending task before the mobile device moves from the current microcellular network SCN to another SCN, and the second probability represents the probability that the mobile device moves from the current microcellular network SCN to another SCN before the MEC server processes the pending task.
And F, taking the sum of the product of the first utility function and the first probability and the product of the second utility function and the second probability as an expected utility function for processing the task to be processed.
Due to the integer constraint a in the constrained optimization problem described aboveiE {0,1}, and the above optimization problem is MINLP (mixed integer nonlinear programming) problem. Its feasible set and objective function are both non-convex and therefore NP-hard. In order to solve the above problem, the embodiment of the present invention proposes to find the optimal a and F parameters by a reinforcement learning method instead of the optimization method using the conventional NP-hard problem.
Specifically, the state space, the action space and the reward function of the reinforcement learning method in the embodiment of the invention are described based on the theory of the markov decision process, and then the problem is solved based on the optimization algorithm of the classical reinforcement learning algorithm Q-learning.
1) Markov decision process: in a mobile edge computing network, a traditional task joint distribution problem solving method for task unloading and resource allocation comprises an exhaustive search algorithm and a game theory. However, these methods also have many limitations. On one hand, the algorithm has high complexity and low efficiency; on the other hand, the computing resource consumption is large, the fault tolerance is low, and the application to a large-scale network scene is difficult. If a reinforcement learning method is adopted to solve the joint optimization problem, the limitations and disadvantages of the traditional algorithm can be avoided to a certain extent.
In general, the reinforcement learning model is based on a markov decision process, and aims to provide an intuitive framework for learning from interaction to achieve a target. In reinforcement learning, an agent refers to a learner or a decision maker. Anything that interacts with an agent but not with the agent itself is called an environment. During the interaction, the agent selects and executes an action in one state by the policy, the environment responds to this action by turning the agent to the next new state, and then gives feedback on this action and generates a reward.
After successive iterations, the agent optimizes the reward value by selecting different behaviors and converges to an optimal state. As shown in fig. 3, reinforcement learning is performed in the dynamic interaction of the agent with the environment.
Can use
Figure BDA0002923684480000151
Representing a state space by
Figure BDA0002923684480000152
Representing the motion space by
Figure BDA0002923684480000153
A bonus space is represented. Without specific reference, the reinforcement learning solution problem is usually based on a limited Model-driver Programming (MDP) Model. I.e., the state, actions and reward space of the MDP are all a finite set. The decision of an agent on a particular task, such as an action selected in a state, is determined and executed according to a policy function. Generally, the policy function is mostly a probability distribution. In each interaction, the agent has an initial state stIn this initial state, action a may be selected and executed according to a policy functiont. The environment gives feedback to the action and then generates a reward value rt+1Let the agent enter the next new state st+1It will be used by the agent as a new initial state for the next round of interaction. The next state of the agent is random and its state transition probability is markov. Markov property can be understood as the state transition probability being independent of the past. The agent makes new decisions based on the newly observed states, which are repeated in sequence and then iterated until final convergence.
2) Three key elements of reinforcement learning: in an embodiment of the invention, it is assumed that the ME acts as a decision maker for the MDP, trying to access a nearby MEC server to interact with the environment. The environment consists of channel conditions and MEC server conditions. Despite the uncertainty of the environment, the ME attempts to maximize the utility of the system throughout the interaction. The ME's actions affect the future state of the environment and thus the ME's next time action selection and state space. In the case of partially random results and controlled by a decision maker, the MDP provides a good mathematical framework for modeling, and then the whole decision making process is completed through a reinforcement learning method.
In the embodiment of the invention, the base station can be used as an intelligent agent.
More specifically, and more specifically, the task allocation decision process is modeled as a Model-Driven Programming (MDP) Model. The complete interaction period between the agent and the environment is called T, which can be divided into a plurality of time steps, and the system has a state s at each time step T (T1, 2)t. The MDP iterates from a random initial state until it eventually converges. As a decision maker, the system selects state s according to different strategies employed by different algorithmstThe following optional actions are performed. At the same time, according to the selected action, the system obtains the corresponding reward, and then enters the next state st+1. As before, the future states in this MDP are dependent only on the current state, and are independent of the historical states, which ensures memory-less Markov.
The state space, the action space and the reward function in the embodiment of the invention can be expressed as follows:
state space: in order to maximize the overall efficiency of the whole system, the state of the network scenario needs to reflect the number of tasks to be solved by the ME in the current system, and the moving state of the ME in the SCN. The state thus consists mainly of three quantities, the number of MEs, the characteristics of the task to be solved and the mobility of the ME. Number of MEs SnIndicating the number of MEs in the microcell for which tasks need to be resolved; characteristics of the task to be solved
Figure BDA0002923684480000161
The method comprises the steps of indicating the size of computing resources of an MEC server required to be allocated by a task to be solved; mobility of ME
Figure BDA0002923684480000162
The system is used for showing the user movement characteristics; for the wireless network scenario in this invention, the state may be defined as
Figure BDA0002923684480000163
An action space: for each time step t, the ME selects and executes an action in the current state according to the strategy used. A greedy algorithm is adopted as a strategy, because the strategy is strong in universality and easy to implement. At the same time, ME follows the present situation stMove to the next state
Figure BDA0002923684480000164
By using
Figure BDA0002923684480000165
Figure BDA0002923684480000166
Representing the motion space of the MDP for
Figure BDA0002923684480000167
A i0 denotes the choice of performing the computation task locally, ai1 indicates that the task is computed by the MEC server allocated in the current microcell.
The reward function: after each interaction between the agent and the environment, the ME, as an agent, gets feedback from the environment, i.e. a reward r, reflecting the good or bad result of the ME performing a certain behavior in a certain state. Generally, the RL return should be related to the optimization formula. Since the goal of the present implementation optimization problem is to maximize the total benefit of all MEs, while the goal of the RL is to obtain the maximum reward. Therefore, according to the positive correlation relationship between the two functions, the reward function is formulated as follows:
Figure BDA0002923684480000171
wherein the content of the first and second substances,
Figure BDA0002923684480000172
is a parameter of the system award.
Q-learning is a classic value-based RL algorithm. Q (s, a), the Q value, is the expected benefit that state s can achieve to perform action a, i.e., the system benefit. For each step, the environment provides a reward r to the agent based on its actions. Therefore, the key step of the Q-learning algorithm is to establish a Q table corresponding to the Q value obtained for each state-action pair. The choice to maximize the profit is then made based on the Q table, and the agent is able to obtain the greatest benefit based on the Q value. Q (s, a) can be represented as:
Figure BDA0002923684480000173
where s, a is the current state and action and s ', a' is the next state and action. Alpha and gamma are taken as parameters of Q-learning, and alpha is more than or equal to 0 and gamma is less than or equal to 1. Alpha is introduced to measure the value of the last learning versus the current learning. If the settings are too low, the agent will only focus on previously learned content and not on new rewards. Typically, 0.5 is used to balance the a priori knowledge with the new reward. γ is defined as a learning parameter. Notably, γ → 0 indicates that the agent is more inclined to consider the reward at hand, while γ → 1 indicates that the agent is concerned about the reward in the future. Generally, γ is 0.9, so that future consideration can be fully considered.
In the embodiment of the invention, the specific Q table obtained by the Q-learning-based optimization algorithm can be realized by the following algorithm 1: RL Algorithm Based on Q-learning method (RL-Based Algorithm with Q-learning method, RLBA):
initialization Q (s, a)
Randomly configuring state st
for each time step t: do
At the current state stSelecting a certain action at
Execution of atAnd calculating the Q value Q(s)t,at)
Update Q table Q(s)t,at)←Q(st,at)+α[Rt+1+γmaxa Q(st+1,at)-Q(st,at)]
Let st←st+1
until reaches the desired state sterminal
end for
The algorithm process of the algorithm 1 is as follows: first, Q (s, a) is initialized to 0. The base station controller configures the current state as s according to the collected equipment informationt. Then, since the mobile device may select a local or edge computing offload mode, the system will take some action atAnd executed. For the current state-action pair(s)t,at) The base station controller awards a value R, which in turn yields a Q value Q(s)t,at). Then according to the updated Q table (two-dimensional table storing Q values), and shifts to the next state st+1. By repeating the above process, the Q table is continuously updated. Depending on the desired state set, the system, when making the selection, tends to select from the Q table that the action to make the Q value higher is performed. After multiple iterations, the Q value is converged to an optimal value, and a calculation unloading decision and configuration of MEC server calculation resource allocation meeting the maximization of system benefit are correspondingly obtained.
In order to obtain correct convergence, Q-learning requires multiple state-action pairs, i.e., Q (s, a) to be updated continuously, and is verified to be substantially consistent with the optimal Q value found by infinite strategy exploration. Therefore, Q-learning finds the optimal action selection strategy for each step after an infinite number of searches. If the number of iterations of algorithm 1 is represented by I, the computational complexity of the algorithm can be expressed as
Figure BDA0002923684480000191
The method uses a greedy strategy to randomly explore in a finite state space so as to obtain a near-optimal scheme.
In this way, after the state of the system, that is, the state information, is determined subsequently, the action selection policy with the largest Q value in the Q table may be determined to determine the action selection policy corresponding to the state information.
The technical effect of the task processing method in the MEC network provided by the embodiment of the invention is verified by combining specific experimental data. Where table 1 shows the parameters used in the simulation process.
TABLE 1
Figure BDA0002923684480000192
Figure BDA0002923684480000201
In the simulation process, R is considered in the embodiment of the inventionS80 m. Assuming that the transmission power of the ME obeys the distribution
Figure BDA0002923684480000202
Wherein mu1=20dBm,σ 12. Mean residence time compliant distribution of ME in view of mobility model
Figure BDA0002923684480000203
Wherein mu2=40seconds,σ 220. According to table 1, parameters for reinforcement learning may be set, where α is 0.5 and γ is 0.9.
The performance of the proposed task handling method in the MEC network, i.e. the task offloading and migration method in the MEC network taking into account the variability of the mobile device location (referred to as RLBA algorithm for short), is evaluated by performing simulations on Matlab. The RLBA algorithm can be compared in performance with the following three algorithms.
(1) The Genetic Algorithm (Genetic Algorithm-Based Algorithm, GABA) has good global search performance and is a traditional suboptimal Algorithm for solving the optimization problem.
(2) Both the Random Offloading Algorithm (ROA) and the Full Offloading Algorithm (FOA) take into account the mobility of the ME.
Where ROA denotes ME randomly selecting local or edge computation, the proportion of tasks allocated by the algorithm is 0.5. FOA indicates that all tasks are offloaded to the edge node and the algorithm allocates a proportion of tasks of 1.
The convergence properties of RLBA and GABA can be compared. FIG. 4 is a graph of comparative analysis of convergence of RLBA and GABA in the practice of the present invention; wherein the ordinate represents the total benefit of the system and the abscissa represents the number of iterations. As shown in fig. 4, regardless of the Q-learning based method proposed in the embodiment of the present invention, i.e., RLBA, or GABA method, the total benefit of ME increases with the number of iterations until a relatively stable value is reached, which indicates that ME is converged. Under the condition of a small number of MEs, the convergence speed of the genetic algorithm is high and is basically within 200 iterations. It can be observed that once RLBA converges, a higher system benefit is always obtained than GABA.
FIG. 5 is a graph illustrating the total benefits for different ME amounts; wherein the ordinate represents the total benefit of the system and the abscissa represents the ME number. Fig. 5 shows the significant change in overall benefit and performance with increasing numbers of MEs. As the number of MEs increased from 4 to 8, RLBA, GABA and ROA remained on the trend of increasing continuously, FOA increased first and then decreased rapidly. The initial computational resources are sufficient for RLBA and GABA, so as the number of MEs increases, more tasks can be offloaded. However, when the number reaches 5, the overall benefit begins to change slowly. This is because the computing power of the MEC server is not infinite, it can only serve a certain number of MEs, the more tasks, the more decision choices, so the overall benefit growth rate starts to slow down. The reason for ROA is similar, but the overall benefit is relatively low because it has a small number of tasks to offload. For FOA, the reason for the initial rise in system revenue is the abundant resources. However, as migration tasks become more frequent, the incremental gains in saving time and effort will outweigh the increased migration costs. When there are 6 MEs in the system, the total revenue begins to drop at the same time. However, as the number of users increases, the migration cost of FOAs is still increasing. The profit due to time saving starts to decrease, so the total benefit starts to decrease.
FIG. 6 is a diagram of the total benefit for different MEC server computing resources; wherein the ordinate represents the total benefit of the system and the abscissa represents the computing power of the MEC server. The overall benefit with different edge computation power is shown in fig. 6. As can be seen from fig. 7, as the computing resource capacity of the MEC server increases, each algorithm can obtain higher benefits. However, their growth rates are obviously different depending on the number of offloaded and migrated tasks. According to the feedback of the environment, the RLBA and the GABA selectively share tasks through different strategies, and the aim is to reduce the migration cost. As such, RLBA and GABA achieve higher overall revenue than other schemes. Both algorithms improve efficiency by offloading as much as possible to the edges when the capacity of the MEC server is sufficient. The reason that the overall benefit of RLBA is higher compared to GABA is that the offloading decision of the Q-learning algorithm is more advantageous. Furthermore, the overall revenue of ROA grows slowly. The rate of rise of ROA is close to the two optimal algorithms, but the gain is slightly lower due to the uncertainty of the offloading task. The reasons for the FOA speed increase vary. FOAs continue to be offloaded to the edge, but as the computing power of MEC servers increases, more and more tasks no longer need to be migrated, and a reduction in migration costs can quickly increase overall efficiency.
FIG. 7 is a diagram illustrating the total system benefits for different migration costs; wherein the ordinate represents the total benefit of the system and the abscissa represents the migration cost. In fig. 7, the overall system benefit decreases as migration costs increase. Among them, RLBA and GABA decline slowly, i.e., the increase in migration cost has no significant effect on the system. It can be seen that the total yield of RLBA and GABA is higher than that of the other three schemes, RLBA is slightly better than GABA, which again verifies that reinforcement learning has superior performance compared with the traditional method. In the case of the ROA scheme, the ME for selecting the edge node calculation task is not much, and therefore the migration probability is relatively low. In the case of using FOA, most MEs choose an edge computing approach, which results in insufficient computing resources of the MEC server and an increase in the number of tasks that need to be migrated. As migration costs increase, the overall benefit decreases significantly.
Corresponding to the task processing method in the mobile edge computing MEC network provided in the foregoing embodiment, an embodiment of the present invention further provides a task processing device in the mobile edge computing MEC network, as shown in fig. 8, where the task processing device may include:
a first determining module 801, configured to determine current state information, where the current state information includes the number of mobile devices having a task to be processed in a system at a current time, task feature information of the task to be processed, and mobile feature information of each mobile device having the task to be processed;
a searching module 802, configured to search, from a pre-established Q table, an action and a system benefit corresponding to current state information, where the action includes that each mobile device locally processes a task to be processed that exists in itself or processes a task to be processed that exists in itself through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;
and the selecting module 803 is configured to select the action corresponding to the maximum system benefit as the target action corresponding to the current state information, so that each mobile device processes the task to be processed according to the target action.
Optionally, the apparatus further comprises:
a second determining module (not shown in the figure), configured to determine state information corresponding to a historical time, where the state information corresponding to the time includes the number of mobile devices having a task to be processed in the system at the time, task feature information of the task to be processed, and mobile feature information of each mobile device having the task to be processed;
a selecting module (not shown in the figure) for selecting an action corresponding to the state information according to a preset strategy for the state information;
a third determining module (not shown in the figure) for determining expected system benefits corresponding to the actions performed under the state information;
an updating module (not shown in the figure) for updating the expected system benefit corresponding to the state information and the action in the Q table to be established based on the determined expected system benefit;
a repeating module (not shown in the figure) for determining the state information of a plurality of historical moments respectively, and repeating the steps of the selecting module, the third determining module and the updating module until the expected system benefit for each state information and each action is converged to obtain an established Q table; and aiming at each piece of state information, the Q table comprises expected system benefits corresponding to the combination of each piece of state information and each action.
Optionally, an update module (not shown in the figure), specifically configured to pass a preset formula
Figure BDA0002923684480000231
Updating state information in a Q table to be established and expected system benefits corresponding to the actions;
wherein NewQ (s, a) is the updated expected system gain, Q (s, a) is the expected system gain obtained in the previous iteration, Q (s ', a') is the expected system gain determined in the current iteration, alpha and gamma are preset parameters, alpha is greater than or equal to 0, gamma is less than or equal to 1, and r is an award value.
Optionally, the system is a dual-layer cellular network, and the dual-layer cellular network includes: the mobile edge computing MEC system comprises a macro base station and a plurality of micro base stations, wherein each micro base station is respectively located at the center of a micro cellular network SCN, a mobile edge computing MEC server is deployed in the micro cellular network SCN where each micro base station is respectively located, and a plurality of mobile devices are randomly distributed in the coverage range of the plurality of SCNs.
Optionally, the task characteristic information includes a data amount of the task to be processed, a calculation resource amount required for completing the task to be processed, and a maximum time delay required to be satisfied; the mobile characteristic information comprises the stay time of the mobile equipment in a service coverage area, and the service coverage area is an area covered by a microcellular network SCN where the mobile equipment is located currently;
a third determination module (not shown), in particular for passing the expected utility function
Figure BDA0002923684480000232
Determining an expected system revenue;
wherein the content of the first and second substances,
Figure BDA0002923684480000233
in order to be a first utility function,
Figure BDA0002923684480000234
for the probability corresponding to the first utility function,
Figure BDA0002923684480000235
in order to be a function of the second utility function,
Figure BDA0002923684480000236
for the probability corresponding to the second utility function,
Figure BDA0002923684480000237
is the number of mobile devices in the system for which there are pending tasks.
Optionally, a third determining module (not shown in the figure), configured to determine time saving, power consumption saving, and total cost, where the time saving is time saved by offloading the to-be-processed task from the mobile device to the edge computing server as compared with the time saved by the mobile device itself processing the to-be-processed task; the power saving is the local power saving of offloading the pending task from the mobile device to the MEC server compared to the mobile device processing the pending task; the total cost is the power consumption and resources required by the MEC server to process the tasks to be processed; determining a migration cost, wherein the migration cost represents the cost generated in a transmission process required by the mobile equipment to move from the current micro-cellular network SCN to another SCN before the MEC server completes the to-be-processed task, and the transmission process represents the process that the MEC server forwards a processing result obtained by processing the to-be-processed task to another SCN to which the mobile equipment moves through the macro base station and sends the processing result to the mobile equipment through the other SCN; calculating a difference value between the sum of time saving and power saving and the total cost, and taking the difference value as a first utility function; before the mobile equipment moves from the current microcellular network SCN to another SCN, the MEC server completes the utility function corresponding to the task to be processed; calculating the sum of time saving and power saving and the difference between the sum of the total cost and the migration cost, and taking the difference as a second utility function, wherein the second utility function represents a utility function corresponding to the fact that the mobile equipment moves from the current microcellular network SCN to another SCN before the MEC server completes the task to be processed; determining a first probability representing a probability that the MEC server has processed the pending task before the mobile device moves from the current microcellular network SCN to another SCN, and a second probability representing a probability that the mobile device has moved from the current microcellular network SCN to another SCN before the MEC server processes the pending task, taking a sum of a product of the first utility function and the first probability, and a product of the second utility function and the second probability as an expected utility function for processing the pending task.
Optionally, a third determining module (not shown in the figure), specifically for passing through the formula
Figure BDA0002923684480000241
Determining to save time;
wherein the content of the first and second substances,
Figure BDA0002923684480000242
to save time, θtIn order to save the time of the coefficient of return,
Figure BDA0002923684480000243
the time required for the mobile device to handle the pending task itself,
Figure BDA0002923684480000244
time consumed for offloading the pending task from the mobile device to the edge computing server, MEC, server;
by the formula
Figure BDA0002923684480000245
Determining to save power consumption;
wherein the content of the first and second substances,
Figure BDA0002923684480000246
to save power consumption, θEIn order to save the revenue factor for local power consumption,
Figure BDA0002923684480000247
energy consumption for the mobile device to process the task to be processed;
by the formula
Figure BDA0002923684480000248
Determining a total cost;
wherein the content of the first and second substances,
Figure BDA0002923684480000249
for the sake of the total cost,
Figure BDA00029236844800002410
energy consumption, θ, for MEC server processingfPrice per unit of execution resources for allocated MEC servers, fiTo allocate to MEiThe edge computing resources of (1);
by the formula
Figure BDA00029236844800002411
Determining a first utility function
Figure BDA00029236844800002412
By the formula
Figure BDA00029236844800002413
Determining a second utility function
Figure BDA00029236844800002414
Wherein the content of the first and second substances,
Figure BDA0002923684480000251
is the migration cost.
The embodiment of the present invention further provides an electronic device, as shown in fig. 9, which includes a processor 901, a communication interface 902, a memory 903 and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904.
A memory 903 for storing computer programs;
the processor 901 is configured to implement the method steps of the task processing method in the mobile edge computing MEC network when executing the program stored in the memory 903.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In a further embodiment provided by the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, which, when being executed by a processor, implements the method steps of the task processing method in the mobile edge computing, MEC, network described above.
In a further embodiment provided by the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method steps of the above method for task handling in a mobile edge computing, MEC, network.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A task processing method in a Mobile Edge Computing (MEC) network is characterized by comprising the following steps:
determining current state information, wherein the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;
searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;
and selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so that each mobile device processes the task to be processed according to the target action.
2. The method of claim 1, wherein the step of establishing the Q table comprises:
determining state information corresponding to a historical moment, wherein the state information corresponding to the historical moment comprises the number of mobile devices with tasks to be processed in the system, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed at the historical moment;
step 1, selecting an action corresponding to the state information according to a preset strategy aiming at the state information;
step 2, determining expected system benefits corresponding to the action executed under the state information;
step 3, updating the state information in the Q table to be established and the expected system income corresponding to the action based on the determined expected system income;
respectively determining state information of a plurality of historical moments, and repeatedly executing the steps 1 to 3 until expected system benefits for each state information and each action are converged to obtain an established Q table; and aiming at each piece of state information, the Q table comprises expected system benefits corresponding to the combination of each piece of state information and each action.
3. The method of claim 2, wherein updating the expected system revenue corresponding to the state information and the action in the Q-table to be built based on the determined expected system revenue and the reward value comprises:
by preset formulas
Figure FDA0002923684470000021
Updating the state information in the Q table to be established and the expected system income corresponding to the action;
wherein NewQ (s, a) is the updated expected system gain, Q (s, a) is the expected system gain obtained from the previous iteration, Q (s ', a') is the expected system gain determined from the current iteration, α and γ are preset parameters, α is greater than or equal to 0, γ is less than or equal to 1, and r is an incentive value.
4. The method of claim 2, wherein the system is a dual-tier cellular network, the dual-tier cellular network comprising: the mobile equipment comprises a macro base station and a plurality of micro base stations, wherein each micro base station is respectively positioned in the center of a micro cellular network SCN, an MEC server is deployed in the micro cellular network SCN where each micro base station is respectively positioned, and a plurality of mobile equipment are randomly distributed in the coverage range of the plurality of SCNs.
5. The method of claim 4, wherein the task characteristic information comprises a data amount of the task to be processed, an amount of computing resources required to complete the task to be processed, and a maximum time delay required to be satisfied; the mobile characteristic information comprises the stay time of the mobile equipment in a service coverage area, wherein the service coverage area is the area covered by the micro cellular network SCN where the mobile equipment is located currently;
the determining an expected system benefit corresponding to the performing the action under the state information includes:
by expected utility function
Figure FDA0002923684470000022
Determining an expected system revenue;
wherein the content of the first and second substances,
Figure FDA0002923684470000023
in order to be a first utility function,
Figure FDA0002923684470000024
for the probability corresponding to the first utility function,
Figure FDA0002923684470000025
in order to be a function of the second utility function,
Figure FDA0002923684470000026
for the probability corresponding to the second utility function,
Figure FDA0002923684470000027
the number of mobile devices for which tasks to be processed exist in the system.
6. The method of claim 5, wherein determining the expected utility function comprises:
determining a time savings, a power savings, and a total cost, wherein the time savings is a time saved in offloading the pending task from the mobile device to the MEC server as compared to the mobile device itself processing the pending task; the power savings is a local power savings in offloading the pending task from the mobile device to the MEC server as compared to the mobile device processing the pending task; the total cost is the power consumption and resources required by the MEC server to process the task to be processed;
determining a migration cost, wherein the migration cost represents a cost generated by a transmission process required by the mobile device to move from a current microcellular network (SCN) to another SCN before the MEC server processes and completes the task to be processed, and the transmission process represents a process that the MEC server forwards a processing result obtained by processing the task to be processed to the other SCN to which the mobile device moves through a macro base station and sends the processing result to the mobile device through the other SCN;
calculating a difference between the sum of the time saving and the power saving and the total cost, and taking the difference as a first utility function; wherein the first utility function represents a utility function corresponding to the fact that the MEC server has already processed and completed the task to be processed before the mobile device moves from a current microcellular network (SCN) to another SCN;
calculating a difference value between the sum of the time saving and the power saving and the sum of the total cost and the migration cost, and taking the difference value as a second utility function, wherein the second utility function represents a utility function corresponding to the mobile device moving from a current microcellular network (SCN) to another SCN before the MEC server completes the task to be processed;
determining a first probability representing a probability that the MEC server has processed the task to be processed before the mobile device moves from a current microcellular network SCN to another SCN, and a second probability representing a probability that the mobile device moves from the current microcellular network SCN to another SCN before the MEC server processes the task to be processed,
taking the sum of the product of the first utility function and the first probability and the product of the second utility function and the second probability as an expected utility function for processing the task to be processed.
7. The method of claim 6, wherein determining the time savings, the power consumption savings, and the total cost comprises:
by the formula
Figure FDA0002923684470000031
Determining the time savings;
wherein the content of the first and second substances,
Figure FDA0002923684470000032
for said saving of time, θtIn order to save the time of the coefficient of return,
Figure FDA0002923684470000033
the time required for the mobile device to process the pending task itself,
Figure FDA0002923684470000034
time consumed to offload pending tasks from the mobile device to the MEC server;
by the formula
Figure FDA0002923684470000041
Determining to save power consumption;
wherein the content of the first and second substances,
Figure FDA0002923684470000042
to said save power, θEIn order to save the revenue factor for local power consumption,
Figure FDA0002923684470000043
energy consumption resulting from processing the task to be processed for the mobile device itself;
by the formula
Figure FDA0002923684470000044
Determining a total cost;
wherein the content of the first and second substances,
Figure FDA0002923684470000045
for the sake of the total cost,
Figure FDA0002923684470000046
energy consumption, θ, for MEC server processingfPrice per unit of execution resources for allocated MEC servers, fiTo allocate to MEiThe edge computing resources of (1);
calculating a difference between the sum of the time savings and the power savings and the total cost, and using the difference as a first utility function, comprising:
by the formula
Figure FDA0002923684470000047
Determining a first utility function Qi 1
Said calculating a difference between a sum of said time savings and said power savings and a sum of said total cost and said migration cost, and using said difference as a second utility function, comprising:
by the formula
Figure FDA0002923684470000048
Determining a second utility function
Figure FDA0002923684470000049
Wherein the content of the first and second substances,
Figure FDA00029236844700000410
is the migration cost.
8. A task processing apparatus in a mobile edge computing, MEC, network, comprising:
the system comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for determining current state information, and the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;
the searching module is used for searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;
and the selection module is used for selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so as to enable each mobile device to process the task to be processed according to the target action.
9. The apparatus of claim 8, further comprising:
the second determining module is used for determining state information corresponding to a historical moment, wherein the state information corresponding to the historical moment comprises the number of mobile devices with tasks to be processed in the system at the moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;
the selection module is used for selecting the action corresponding to the state information according to a preset strategy aiming at the state information;
a third determining module for determining an expected system revenue corresponding to the action being performed under the state information;
the updating module is used for updating the state information in the Q table to be established and the expected system income corresponding to the action based on the determined expected system income;
the repeating module is used for respectively determining the state information of a plurality of historical moments, and repeatedly executing the steps of the selecting module, the third determining module and the updating module until the expected system income for each state information and each action is converged to obtain an established Q table; and aiming at each piece of state information, the Q table comprises expected system benefits corresponding to the combination of each piece of state information and each action.
10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.
CN202110125013.1A 2021-01-29 2021-01-29 Task processing method, device and equipment in Mobile Edge Computing (MEC) network Pending CN112905315A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110125013.1A CN112905315A (en) 2021-01-29 2021-01-29 Task processing method, device and equipment in Mobile Edge Computing (MEC) network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110125013.1A CN112905315A (en) 2021-01-29 2021-01-29 Task processing method, device and equipment in Mobile Edge Computing (MEC) network

Publications (1)

Publication Number Publication Date
CN112905315A true CN112905315A (en) 2021-06-04

Family

ID=76120799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110125013.1A Pending CN112905315A (en) 2021-01-29 2021-01-29 Task processing method, device and equipment in Mobile Edge Computing (MEC) network

Country Status (1)

Country Link
CN (1) CN112905315A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535390A (en) * 2021-06-28 2021-10-22 山东师范大学 Method, system, device and medium for distributing multi-access edge computing node resources
CN116647880A (en) * 2023-07-26 2023-08-25 国网冀北电力有限公司 Base station cooperation edge computing and unloading method and device for differentiated power service

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109120457A (en) * 2018-09-13 2019-01-01 余利 The method for processing business of the intelligent cloud of framework is defined based on distributed software
CN110365787A (en) * 2019-07-19 2019-10-22 南京工业大学 A kind of application container simultaneously optimizes layout method based on the edge calculations of micro services frame
CN110798849A (en) * 2019-10-10 2020-02-14 西北工业大学 Computing resource allocation and task unloading method for ultra-dense network edge computing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109120457A (en) * 2018-09-13 2019-01-01 余利 The method for processing business of the intelligent cloud of framework is defined based on distributed software
CN110365787A (en) * 2019-07-19 2019-10-22 南京工业大学 A kind of application container simultaneously optimizes layout method based on the edge calculations of micro services frame
CN110798849A (en) * 2019-10-10 2020-02-14 西北工业大学 Computing resource allocation and task unloading method for ultra-dense network edge computing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONGYU WANG等: "《Reinforcement Learning-Based Joint Task Offloading and Migration Schemes Optimization in Mobility-Aware MEC Network》", 《CHINA COMMUNICATIONS》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535390A (en) * 2021-06-28 2021-10-22 山东师范大学 Method, system, device and medium for distributing multi-access edge computing node resources
CN113535390B (en) * 2021-06-28 2024-03-26 山东师范大学 Multi-access edge computing node resource allocation method, system, equipment and medium
CN116647880A (en) * 2023-07-26 2023-08-25 国网冀北电力有限公司 Base station cooperation edge computing and unloading method and device for differentiated power service
CN116647880B (en) * 2023-07-26 2023-10-13 国网冀北电力有限公司 Base station cooperation edge computing and unloading method and device for differentiated power service

Similar Documents

Publication Publication Date Title
CN113242568B (en) Task unloading and resource allocation method in uncertain network environment
Baek et al. Managing fog networks using reinforcement learning based load balancing algorithm
CN111405568B (en) Computing unloading and resource allocation method and device based on Q learning
CN112860350B (en) Task cache-based computation unloading method in edge computation
Li et al. Deep reinforcement learning based computation offloading and resource allocation for MEC
CN112416554B (en) Task migration method and device, electronic equipment and storage medium
Li et al. Energy-aware task offloading with deadline constraint in mobile edge computing
CN112134916A (en) Cloud edge collaborative computing migration method based on deep reinforcement learning
CN109818786B (en) Method for optimally selecting distributed multi-resource combined path capable of sensing application of cloud data center
CN109951873B (en) Task unloading mechanism under asymmetric and uncertain information in fog computing of Internet of things
CN109286664A (en) A kind of computation migration terminal energy consumption optimization method based on Lagrange
CN113645637B (en) Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium
CN112905315A (en) Task processing method, device and equipment in Mobile Edge Computing (MEC) network
CN112422644A (en) Method and system for unloading computing tasks, electronic device and storage medium
Mostafavi et al. A stochastic approximation approach for foresighted task scheduling in cloud computing
CN113573363A (en) MEC calculation unloading and resource allocation method based on deep reinforcement learning
KR20230007941A (en) Edge computational task offloading scheme using reinforcement learning for IIoT scenario
Li et al. Computation offloading and service allocation in mobile edge computing
Li et al. DQN-enabled content caching and quantum ant colony-based computation offloading in MEC
Zhang et al. A deep reinforcement learning approach for online computation offloading in mobile edge computing
Gong et al. Dependent task-offloading strategy based on deep reinforcement learning in mobile edge computing
Zhang A computing allocation strategy for Internet of things’ resources based on edge computing
CN113747450A (en) Service deployment method and device in mobile network and electronic equipment
CN111158893A (en) Task unloading method, system, equipment and medium applied to fog computing network
Lyu et al. Multi-leader multi-follower Stackelberg game based resource allocation in multi-access edge computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210604

RJ01 Rejection of invention patent application after publication