CN112905315A

CN112905315A - Task processing method, device and equipment in Mobile Edge Computing (MEC) network

Info

Publication number: CN112905315A
Application number: CN202110125013.1A
Authority: CN
Inventors: 王冬宇; 田心乔; 王思野; 崔浩然; 李琦
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-06-04

Abstract

The embodiment of the invention provides a task processing method, a task processing device and a task processing device in a Mobile Edge Computing (MEC) network, which are applied to the technical field of communication and can determine current state information; searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration; and selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so that each mobile device processes the task to be processed according to the target action. The mobile equipment can process the task according to the action corresponding to the maximum system benefit, and therefore the system benefit of the system where the mobile equipment is located can be improved.

Description

Task processing method, device and equipment in Mobile Edge Computing (MEC) network

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method, an apparatus, and a device for processing a task in a mobile edge computing MEC network.

Background

In recent years, applications of intelligent terminals in mobile networks have become more and more common, and various new applications such as Virtual Reality (VR)/Augmented Reality (AR), image recognition, biometric feature recognition, and the like have emerged. These applications are often resource intensive, i.e., run-time consuming large computing resources, with high quality of service requirements. Although the performance of the processor of the smart terminal is continuously improved, it is still difficult to meet the requirement of processing high-performance applications in a short time, which seriously affects the quality of service provided by the smart terminal to the user. Therefore, how to expand the resources of the intelligent terminal to meet the requirement of executing the high-performance task is a problem to be solved urgently at present.

Cloud computing provides an economical and efficient solution for mass data storage and processing. Mobile Cloud Computing (MCC) allows Mobile application tasks to run in remote data centers by means of a high-speed and reliable wireless interface. However, the delay overhead caused by long distance propagation is large, which results in the MCC architecture not being suitable for solving the current delay-sensitive task. To solve the above problem, Mobile Edge Computing (MEC) has come to work. The MEC technology deploys computing and storage resources at the edge of the network to improve the computing capacity of the mobile network and establish a low-delay and high-bandwidth network service solution. Compared to MCC, MEC circumvents privacy, security issues of mobile applications due to long distance transmission, such as high concentration of information in the platform, vulnerability, and private data leakage and loss due to separation of ownership and ownership of user data.

For an intelligent terminal in a mobile network, that is, a mobile device, the mobile device may locally process a task to be processed existing in the mobile device itself, or may offload the task to be processed existing in the mobile device itself to an edge computing server, and the task to be processed existing in the mobile device itself is processed by the edge computing server. The local processing occupies the resources of the mobile device, and the like, and the processing by the edge computing server causes the consumption of time cost, and the like, i.e. different processing modes bring different benefits and costs, and thus, determining which mode to process the task is important in the task processing process.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, and a device for processing a task in a mobile edge computing MEC network, so as to enable a mobile device to process the task according to an action corresponding to a maximum system benefit, thereby improving the system benefit of a system in which the mobile device is located. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for processing a task in a mobile edge computing MEC network, including:

determining current state information, wherein the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;

searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;

and selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so that each mobile device processes the task to be processed according to the target action.

In a second aspect, an embodiment of the present invention provides a task processing device in a mobile edge computing MEC network, including:

the system comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for determining current state information, and the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;

the searching module is used for searching actions and system benefits corresponding to the current state information from a pre-established Q table, wherein the actions comprise that each mobile device locally processes a task to be processed which exists in the mobile device or processes the task to be processed which exists in the mobile device through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;

and the selection module is used for selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so as to enable each mobile device to process the task to be processed according to the target action.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of the first aspect when executing the program stored in the memory.

The task processing method, the device and the equipment in the MEC network provided by the embodiment of the invention can determine the current state information; searching the action and the system income corresponding to the current state information from a pre-established Q table; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration; and selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information so that each mobile device processes the task to be processed according to the target action. The action corresponding to the maximum system benefit is selected as the target action corresponding to the current state information by searching the action and the system benefit corresponding to the current state information in the pre-established Q table, so that each mobile device processes the task to be processed according to the target action, the mobile device can process the task according to the action corresponding to the maximum system benefit, and the system benefit of the system where the mobile device is located can be improved. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a flowchart of a task processing method in an MEC network according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a dual-layer cellular network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating interaction between an agent and an environment in reinforcement learning according to an embodiment of the present invention;

FIG. 4 is a graph of comparative analysis of convergence of RLBA and GABA in the practice of the present invention;

FIG. 5 is a graph illustrating the total benefits for different ME amounts;

FIG. 6 is a diagram of the total benefit for different MEC server computing resources;

FIG. 7 is a diagram illustrating the total system benefits for different migration costs;

fig. 8 is a schematic structural diagram of a task processing device in an MEC network according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the present invention, the location variability of the Mobile Equipment (ME) is taken into account, and if the equipment leaves the coverage area of the microcellular network (SCN) during the movement, the migration between base stations is required for calculating the unloading result, thereby adding extra cost to the system. The invention can be applied in a dual-layer cellular network architecture that takes into account location variability of the mobile device; aiming at a time delay and power consumption model of a device processing task, a system total benefit function, namely an expected utility function, is constructed; describing a mixed integer nonlinear programming problem of the total benefit function of the system as a Markov decision process, and providing an optimization framework based on reinforcement learning to replace a traditional optimization method; and solving the optimization problem by adopting a classical Q-learning algorithm in reinforcement learning. The invention can obviously improve the total benefit of the system, namely the system formed by the mobile equipment in the double-layer cellular network architecture, wherein the total benefit refers to the benefit obtained by the system due to the reduction of time delay and energy consumption.

The following describes in detail a task processing method in a mobile edge computing MEC network provided by an embodiment of the present invention.

The task processing method in the mobile edge computing MEC network provided by the embodiment of the invention can comprise the following steps:

In the embodiment of the invention, the action corresponding to the maximum system benefit and the system benefit are searched in the pre-established Q table, and the action corresponding to the maximum system benefit is selected as the target action corresponding to the current state information, so that each mobile device processes the task to be processed according to the target action, and the mobile device can process the task according to the action corresponding to the maximum system benefit, thereby improving the system benefit of the system where the mobile device is located.

Fig. 1 is a flowchart of a task processing method in a mobile edge computing MEC network according to an embodiment of the present invention, and details of the task processing method in the mobile edge computing MEC network according to the embodiment of the present invention are described with reference to fig. 1.

S101, determining current state information, wherein the current state information comprises the number of mobile devices with tasks to be processed in a system at the current moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed.

The system may be a dual-layer cellular network comprising: the mobile edge computing MEC system comprises a macro base station and a plurality of micro base stations, wherein each micro base station is respectively located at the center of a micro cellular network SCN, a mobile edge computing MEC server is deployed in the micro cellular network SCN where each micro base station is respectively located, and a plurality of mobile devices are randomly distributed in the coverage range of the plurality of SCNs.

The task processing method in the mobile edge computing MEC network provided by the embodiment of the invention can be executed by a base station, such as a macro base station. The macro base station may obtain the device information of each mobile device in the system, and then may perform statistics on the device information of the mobile devices existing in the system to obtain current state information corresponding to the system.

The task characteristic information can comprise the data volume of the task to be processed, the computing resource volume required for completing the task to be processed and the maximum time delay required to be met; the mobile profile information includes a stay time of the mobile device in a service coverage area, which is an area covered by the microcellular network SCN in which the mobile device is currently located.

For example, the status information may be expressed as S ═ Sⁿ,S^d,S^m]Wherein S isⁿIndicating the number of MEs that have tasks to process in the two-layer cellular network; s^dThe task characteristic information of the task to be processed is expressed, and the characteristic of the task to be processed can be expressed as

Refers to the size of MEC computing resources required to be allocated by the task to be processed; s^mRepresenting the mobility characteristics information of the mobile equipment, also understood as the mobility of the ME, may be represented as

For demonstrating the movement characteristics of the device.

S102, searching the action and the system benefit corresponding to the current state information from a pre-established Q table.

The actions include each mobile device processing its own existing pending tasks locally or through an edge server.

The pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions according to a plurality of state information and performing reinforcement learning iteration.

The system benefit may represent the latency and energy consumption of the mobile device processing tasks in the system.

The step of establishing the Q table may include:

determining state information corresponding to a historical moment, wherein the state information corresponding to the historical moment comprises the number of mobile devices with tasks to be processed in a system at the moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;

and step 1, selecting an action corresponding to the state information according to a preset strategy aiming at the state information.

The preset strategy may comprise a greedy algorithm strategy. The greedy algorithm strategy is strong in universality and easy to implement.

And 2, determining expected system benefits corresponding to the action executed under the state information.

In an alternative embodiment, step 2 may comprise:

by expected utility function

A desired system benefit is determined.

Wherein the content of the first and second substances,

in order to be a first utility function,

for the probability corresponding to the first utility function,

in order to be a function of the second utility function,

for the probability corresponding to the second utility function,

is the number of mobile devices in the system for which there are pending tasks.

Step 3, updating the expected system income corresponding to the state information and the action in the Q table to be established based on the determined expected system income;

respectively determining state information of a plurality of historical moments, and repeatedly executing the steps 1 to 3 until expected system benefits for each state information and each action are converged to obtain an established Q table; and aiming at each piece of state information, the Q table comprises expected system benefits corresponding to the combination of each piece of state information and each action.

In an alternative embodiment, the updating, in step 3, the expected system benefit corresponding to the state information and the action in the Q table to be created based on the determined expected system benefit and the reward value may include:

by preset formulas

And updating the state information in the Q table to be established and the expected system benefit corresponding to the action.

Wherein NewQ (s, a) is the updated expected system gain, Q (s, a) is the expected system gain obtained in the previous iteration, Q (s ', a') is the expected system gain determined in the current iteration, alpha and gamma are preset parameters, alpha is greater than or equal to 0, gamma is less than or equal to 1, and r is an award value.

S103, selecting the action corresponding to the maximum system benefit as the target action corresponding to the current state information, so that each mobile device processes the task to be processed according to the target action.

The Q table corresponds to the Q value obtained by each state-action pair, so that the selection which enables the system to have the highest profit can be made according to the Q table, and the maximum system benefit can be obtained according to the Q value.

Fig. 2 is a schematic structural diagram of a dual-layer cellular network according to an embodiment of the present invention. Referring to fig. 2, a dual-layer cellular network having Macro Base Stations (MBS) and a number of micro cellular networks (SCN), each equipped with a micro base station (SBS). The MEC server is deployed in a microcellular network center attached to the SBS, and the MEs are randomly distributed in the service area. As the MEs move, they are not stationary, but may leave the currently served unit. In addition, due to the differences of different MEs, the states of MEs in the network are also very different. Some MEs (e.g. MEs)₁) The time of task computation is less than the time covered by the current SCN, and some other MEs (e.g., MEs)₂) It will move to another unit before the task completes processing. For the latter, the calculation result cannot be directly sent to the ME, and needs to be transmitted to the target unit by means of macro base station relay, i.e. a task migration process is generated, which results in additional migration into a new taskThe method is as follows.

Under the system structure of the dual-layer cellular network, the aggregation of the ME in the dual-layer cellular network is assumed to be

And sets its number set as

ME₁Can be described as

Wherein H_iIndicating the size of the calculation data, which can also be understood as the size of the task, D_iIndicating the size of the computing resource requested by the task,

representing the maximum delay of the task. Wherein D is_iBy measuring the number of CPU revolutions, D is satisfied_i＝εH_iWhere epsilon represents a scaling factor of the size of the computing resources required for the task to the size of the computing data.

To describe the different model functions of the system architecture more clearly, it can be decomposed into three subsystems, a communication model, a mobility model and an offload model.

And (3) communication model:

suppose ME_iHas a transmission power of a constant p_iAnd using d_iRepresents ME_iDistance to SBS, h_iRepresents ME_iChannel gain to SBS. SNR of system, namely double-layer cellular network_iThe calculation formula of (2) is as follows:

where θ represents the standard path loss propagation index, σ²Representing the power of additive white gaussian noise.

Therefore, the transmission rate R of task uploading can be obtained_iThe calculation formula of (2) is as follows:

R_i＝Blog₂(1+SNR_i)

wherein, B is the communication bandwidth of ME and SBS.

Mobility model:

since the SBS is located in the center of the microcell, once the ME leaves the currently serving area, d_iRadius R exceeding SCN_sTask migration due to microcell handover may occur. On the contrary, if d is always satisfied_i≤R_sIt means that ME does not leave the area until the task is completed. Thus, d is_iThe duration before the limit is exceeded is recorded in a period of time, which may also be referred to as the unit dwell time of the ME.

The cell dwell time is the time that a mobile user stays in a given cell and is an important performance indicator for planning network resources and improving QoS. The probability of migration of an ME can be measured by using an exponential function to represent the dwell time. Therefore, the probability density function of the residence time t

Can be expressed as:

wherein, tau_iRefer to ME_iThe average residence time of (a) is different for different MEs. Parameter tau_iObeying Gaussian distribution, reliable parameter tau can be obtained by collecting historical data of ME_i。

Unloading the model:

when the ME needs compute intensive tasks, offloading such tasks can not only speed up the process, but also reduce the runtime and energy consumption of the device. In embodiments of the present invention, the tasks may be computed locally by the mobile device or processed by the MEC server on the SBS side. Can be combined with

As decision variables for task offloading, e.g. a_i1 denotes task offload to MEC server, a_i0 means that the task is executed locally, and therefore,

is the offload decision set for ME.

The unloading model can be refined into a local computation unloading model and an edge computation unloading model.

Local computation offload model: if a is_i0, the ME performs local computations based on its own computing power, where the local computing power is set to

Then ME_iCalculated time of execution

Expressed as:

and the energy consumption brought by the local computation can be expressed as:

where κ is the coefficient related to the switched capacitance.

Edge calculation unloading model: if a is_iThe task is offloaded to the edge node over the wireless channel between the ME and the SBS, when the task is computed by the MEC server connected to the SBS. The process of offloading tasks to the MEC server can be viewed as both transport and execution. Recording the transmission time of the task-related data uploaded to the MEC server

Compared with the data needing to be uploaded, the result obtained by calculation is notOften so small that the transfer time for download or migration is negligible. From the ME perspective, the power consumption of task computation and result transmission need not be considered, and therefore energy consumption (i.e., energy consumption)

Comprises the following steps:

by passing

Set representing the allocation of computing resources of the MEC server, f_iIdentity assignment to ME_iThe edge computing resources of (1). Time of MEC server processing task

Can be expressed as

Maximum capacity C calculated due to MEC server_MECNot infinite, so part of the task may not be offloaded, and the resource allocation must satisfy the constraint condition

Therefore, the calculation formula of the time delay in this case is:

and based on the calculation task parameters of the ME, the mobile characteristics of the ME and the calculation capacity of the ME and the MEC server, providing a system total benefit function and constructing an optimization problem model according to a task unloading mode comprising local calculation and edge calculation.

The benefit function of the offload resource optimization problem may include both revenue and cost components. The revenue consists of two parts: time savings and local power consumption savings. The following benefits and costs are described in conjunction with the computational formulas in the system architecture described above.

Time savings refers to the time saved by the user due to selecting an edge calculation

Can be expressed as:

wherein, theta_tTo save time.

Reduced local power consumption

Can be expressed as the formula:

wherein, theta_EThe profit coefficient for saving the local power consumption, namely the price per unit energy consumption.

Meanwhile, cost sources include both ME selective edge computation mode and possible task migration. The resource cost of selecting the offload mode includes energy consumption of data to be transmitted and MEC server execution resources to be allocated. The total cost is therefore:

wherein, theta_fThe price per unit of resources is enforced for the MEC server to be allocated.

According to the time length relation between the edge calculation time and the estimated stay time, utility functions of the edge calculation mode under the following two situations can be included.

1) The tasks of the ME are successfully completed on the MEC server before leaving the current microcell. This situation can be described as the entire task having been offloadedThe time of course is less than the dwell time of the ME, e.g. ME in FIG. 2₁. That is, no task migration occurs at this time. The probability of this case can be described as the formula of the calculation of the probability density function of the residence time

Thus, the utility function of the corresponding part is:

2) due to the mobile nature of the ME, another situation exists on the MEC server where the ME stays in the previous microcell for a short time, such as the ME in fig. 2₂Leave the corresponding microcell before the task is completed with a probability of

The execution result cannot be directly returned, and must be transferred to the new SCN where the ME is located through the macro base station, and then the result is transferred to the ME. This transfer process will incur additional migration costs

And calculating data H_iIs related to the size of (1) is set as

Where δ represents a scaling factor of migration cost to calculated data. Thus, the utility function of the corresponding part is noted as:

bonding of

And

the expression of the utility function obtained by the calculation formula of (2) is:

according to the mobility model, the corresponding probability is calculated as:

can be influenced by desired effects

To describe ME_iCan be expressed as:

wherein, in a_iSet when equal to 0

Since the ME will not generate revenue from the MEC server when computing the task locally.

The joint optimization problem regarding computation offloading and resource allocation in MEC networks considering mobile device location variability proposed by the embodiments of the present invention aims to maximize the long-term benefit values of all MEs. Considering the delay deadline for completing the task and the MEC server computing resource load, the corresponding constrained optimization problem can be expressed as follows:

s.t.C1:

to ensure that each ME selects either local or edge computation;

C2:

ensuring that the time of edge calculation must be positive and not exceed the deadline of task delay;

C3:

to ensure that MEC server computing resources allocated to each ME are non-negative;

C4:

the allocated computing resources are guaranteed not to exceed the total amount of MEC server capacity.

Specifically, determining the expected utility function may include:

and step A, determining time saving, power consumption saving and total cost.

Wherein the time savings is the time saved in offloading the pending task from the mobile device to the edge computing server as compared to the mobile device itself processing the pending task; the power saving is the local power saving of offloading the pending task from the mobile device to the MEC server compared to the mobile device processing the pending task; the total cost is the power consumption and resources required by the MEC server to process the pending task.

By the formula

The time saving is determined.

Wherein the content of the first and second substances,

to save time, θ_tIn order to save the time of the coefficient of return,

the time required for the mobile device to handle the pending task itself,

time consumed to offload pending tasks from the mobile device to the edge computing server.

By the formula

It is determined to save power consumption.

Wherein the content of the first and second substances,

to save power consumption, θ_EIn order to save the revenue factor for local power consumption,

the energy consumption resulting from the handling of the pending tasks by the mobile device itself.

By the formula

The total cost is determined.

Wherein the content of the first and second substances,

for the sake of the total cost,

energy consumption, θ, for MEC server processing_fPrice per unit of execution resources for allocated MEC servers, f_iTo allocate to ME_iThe edge computing resources of (1).

And step B, determining the migration cost.

The migration cost represents the cost generated in the transmission process required by the mobile equipment to move from the current micro-cellular network SCN to another SCN before the MEC server completes the to-be-processed task, and the transmission process represents the process that the MEC server forwards the processing result obtained by processing the to-be-processed task to another SCN to which the mobile equipment moves through the macro base station and sends the processing result to the mobile equipment through the other SCN;

and C, calculating the difference value of the sum of the time saving and the power saving and the total cost, and taking the difference value as a first utility function.

The first utility function represents a utility function corresponding to a task that the MEC server has already processed and completed before the mobile device moves from the current microcellular network SCN to another SCN.

By the formula

Determining a first utility function

And D, calculating the difference value between the sum of the time saving and the power saving and the sum of the total cost and the migration cost, and taking the difference value as a second effective function.

The second utility function represents a utility function corresponding to the mobile device moving from the current microcellular network SCN to another SCN before the MEC server completes the task to be processed.

By the formula

Determining a second utility function

Wherein the content of the first and second substances,

is the migration cost.

And E, determining the first probability and the second probability.

The first probability represents the probability that the MEC server has processed the pending task before the mobile device moves from the current microcellular network SCN to another SCN, and the second probability represents the probability that the mobile device moves from the current microcellular network SCN to another SCN before the MEC server processes the pending task.

And F, taking the sum of the product of the first utility function and the first probability and the product of the second utility function and the second probability as an expected utility function for processing the task to be processed.

Due to the integer constraint a in the constrained optimization problem described above_iE {0,1}, and the above optimization problem is MINLP (mixed integer nonlinear programming) problem. Its feasible set and objective function are both non-convex and therefore NP-hard. In order to solve the above problem, the embodiment of the present invention proposes to find the optimal a and F parameters by a reinforcement learning method instead of the optimization method using the conventional NP-hard problem.

Specifically, the state space, the action space and the reward function of the reinforcement learning method in the embodiment of the invention are described based on the theory of the markov decision process, and then the problem is solved based on the optimization algorithm of the classical reinforcement learning algorithm Q-learning.

1) Markov decision process: in a mobile edge computing network, a traditional task joint distribution problem solving method for task unloading and resource allocation comprises an exhaustive search algorithm and a game theory. However, these methods also have many limitations. On one hand, the algorithm has high complexity and low efficiency; on the other hand, the computing resource consumption is large, the fault tolerance is low, and the application to a large-scale network scene is difficult. If a reinforcement learning method is adopted to solve the joint optimization problem, the limitations and disadvantages of the traditional algorithm can be avoided to a certain extent.

In general, the reinforcement learning model is based on a markov decision process, and aims to provide an intuitive framework for learning from interaction to achieve a target. In reinforcement learning, an agent refers to a learner or a decision maker. Anything that interacts with an agent but not with the agent itself is called an environment. During the interaction, the agent selects and executes an action in one state by the policy, the environment responds to this action by turning the agent to the next new state, and then gives feedback on this action and generates a reward.

After successive iterations, the agent optimizes the reward value by selecting different behaviors and converges to an optimal state. As shown in fig. 3, reinforcement learning is performed in the dynamic interaction of the agent with the environment.

Can use

Representing a state space by

Representing the motion space by

A bonus space is represented. Without specific reference, the reinforcement learning solution problem is usually based on a limited Model-driver Programming (MDP) Model. I.e., the state, actions and reward space of the MDP are all a finite set. The decision of an agent on a particular task, such as an action selected in a state, is determined and executed according to a policy function. Generally, the policy function is mostly a probability distribution. In each interaction, the agent has an initial state s_tIn this initial state, action a may be selected and executed according to a policy function_t. The environment gives feedback to the action and then generates a reward value r_t+1Let the agent enter the next new state s_t+1It will be used by the agent as a new initial state for the next round of interaction. The next state of the agent is random and its state transition probability is markov. Markov property can be understood as the state transition probability being independent of the past. The agent makes new decisions based on the newly observed states, which are repeated in sequence and then iterated until final convergence.

2) Three key elements of reinforcement learning: in an embodiment of the invention, it is assumed that the ME acts as a decision maker for the MDP, trying to access a nearby MEC server to interact with the environment. The environment consists of channel conditions and MEC server conditions. Despite the uncertainty of the environment, the ME attempts to maximize the utility of the system throughout the interaction. The ME's actions affect the future state of the environment and thus the ME's next time action selection and state space. In the case of partially random results and controlled by a decision maker, the MDP provides a good mathematical framework for modeling, and then the whole decision making process is completed through a reinforcement learning method.

In the embodiment of the invention, the base station can be used as an intelligent agent.

More specifically, and more specifically, the task allocation decision process is modeled as a Model-Driven Programming (MDP) Model. The complete interaction period between the agent and the environment is called T, which can be divided into a plurality of time steps, and the system has a state s at each time step T (T1, 2)_t. The MDP iterates from a random initial state until it eventually converges. As a decision maker, the system selects state s according to different strategies employed by different algorithms_tThe following optional actions are performed. At the same time, according to the selected action, the system obtains the corresponding reward, and then enters the next state s_t+1. As before, the future states in this MDP are dependent only on the current state, and are independent of the historical states, which ensures memory-less Markov.

The state space, the action space and the reward function in the embodiment of the invention can be expressed as follows:

state space: in order to maximize the overall efficiency of the whole system, the state of the network scenario needs to reflect the number of tasks to be solved by the ME in the current system, and the moving state of the ME in the SCN. The state thus consists mainly of three quantities, the number of MEs, the characteristics of the task to be solved and the mobility of the ME. Number of MEs SⁿIndicating the number of MEs in the microcell for which tasks need to be resolved; characteristics of the task to be solved

The method comprises the steps of indicating the size of computing resources of an MEC server required to be allocated by a task to be solved; mobility of ME

The system is used for showing the user movement characteristics; for the wireless network scenario in this invention, the state may be defined as

An action space: for each time step t, the ME selects and executes an action in the current state according to the strategy used. A greedy algorithm is adopted as a strategy, because the strategy is strong in universality and easy to implement. At the same time, ME follows the present situation s_tMove to the next state

By using

Representing the motion space of the MDP for

A

_i0 denotes the choice of performing the computation task locally, a_i1 indicates that the task is computed by the MEC server allocated in the current microcell.

The reward function: after each interaction between the agent and the environment, the ME, as an agent, gets feedback from the environment, i.e. a reward r, reflecting the good or bad result of the ME performing a certain behavior in a certain state. Generally, the RL return should be related to the optimization formula. Since the goal of the present implementation optimization problem is to maximize the total benefit of all MEs, while the goal of the RL is to obtain the maximum reward. Therefore, according to the positive correlation relationship between the two functions, the reward function is formulated as follows:

wherein the content of the first and second substances,

is a parameter of the system award.

Q-learning is a classic value-based RL algorithm. Q (s, a), the Q value, is the expected benefit that state s can achieve to perform action a, i.e., the system benefit. For each step, the environment provides a reward r to the agent based on its actions. Therefore, the key step of the Q-learning algorithm is to establish a Q table corresponding to the Q value obtained for each state-action pair. The choice to maximize the profit is then made based on the Q table, and the agent is able to obtain the greatest benefit based on the Q value. Q (s, a) can be represented as:

where s, a is the current state and action and s ', a' is the next state and action. Alpha and gamma are taken as parameters of Q-learning, and alpha is more than or equal to 0 and gamma is less than or equal to 1. Alpha is introduced to measure the value of the last learning versus the current learning. If the settings are too low, the agent will only focus on previously learned content and not on new rewards. Typically, 0.5 is used to balance the a priori knowledge with the new reward. γ is defined as a learning parameter. Notably, γ → 0 indicates that the agent is more inclined to consider the reward at hand, while γ → 1 indicates that the agent is concerned about the reward in the future. Generally, γ is 0.9, so that future consideration can be fully considered.

In the embodiment of the invention, the specific Q table obtained by the Q-learning-based optimization algorithm can be realized by the following algorithm 1: RL Algorithm Based on Q-learning method (RL-Based Algorithm with Q-learning method, RLBA):

initialization Q (s, a)

Randomly configuring state s_t

for each time step t: do

At the current state s_tSelecting a certain action a_t

Execution of a_tAnd calculating the Q value Q(s)_t,a_t)

Update Q table Q(s)_t,a_t)←Q(s_t,a_t)+α[R_t+1+γmax_a Q(s_t+1,a_t)-Q(s_t,a_t)]

Let s_t←s_t+1

until reaches the desired state s_terminal

end for

The algorithm process of the algorithm 1 is as follows: first, Q (s, a) is initialized to 0. The base station controller configures the current state as s according to the collected equipment information_t. Then, since the mobile device may select a local or edge computing offload mode, the system will take some action a_tAnd executed. For the current state-action pair(s)_t,a_t) The base station controller awards a value R, which in turn yields a Q value Q(s)_t,a_t). Then according to the updated Q table (two-dimensional table storing Q values), and shifts to the next state s_t+1. By repeating the above process, the Q table is continuously updated. Depending on the desired state set, the system, when making the selection, tends to select from the Q table that the action to make the Q value higher is performed. After multiple iterations, the Q value is converged to an optimal value, and a calculation unloading decision and configuration of MEC server calculation resource allocation meeting the maximization of system benefit are correspondingly obtained.

In order to obtain correct convergence, Q-learning requires multiple state-action pairs, i.e., Q (s, a) to be updated continuously, and is verified to be substantially consistent with the optimal Q value found by infinite strategy exploration. Therefore, Q-learning finds the optimal action selection strategy for each step after an infinite number of searches. If the number of iterations of algorithm 1 is represented by I, the computational complexity of the algorithm can be expressed as

The method uses a greedy strategy to randomly explore in a finite state space so as to obtain a near-optimal scheme.

In this way, after the state of the system, that is, the state information, is determined subsequently, the action selection policy with the largest Q value in the Q table may be determined to determine the action selection policy corresponding to the state information.

The technical effect of the task processing method in the MEC network provided by the embodiment of the invention is verified by combining specific experimental data. Where table 1 shows the parameters used in the simulation process.

TABLE 1

In the simulation process, R is considered in the embodiment of the invention_S80 m. Assuming that the transmission power of the ME obeys the distribution

Wherein mu₁＝20dBm，σ ₁2. Mean residence time compliant distribution of ME in view of mobility model

Wherein mu₂＝40seconds，σ ₂20. According to table 1, parameters for reinforcement learning may be set, where α is 0.5 and γ is 0.9.

The performance of the proposed task handling method in the MEC network, i.e. the task offloading and migration method in the MEC network taking into account the variability of the mobile device location (referred to as RLBA algorithm for short), is evaluated by performing simulations on Matlab. The RLBA algorithm can be compared in performance with the following three algorithms.

(1) The Genetic Algorithm (Genetic Algorithm-Based Algorithm, GABA) has good global search performance and is a traditional suboptimal Algorithm for solving the optimization problem.

(2) Both the Random Offloading Algorithm (ROA) and the Full Offloading Algorithm (FOA) take into account the mobility of the ME.

Where ROA denotes ME randomly selecting local or edge computation, the proportion of tasks allocated by the algorithm is 0.5. FOA indicates that all tasks are offloaded to the edge node and the algorithm allocates a proportion of tasks of 1.

The convergence properties of RLBA and GABA can be compared. FIG. 4 is a graph of comparative analysis of convergence of RLBA and GABA in the practice of the present invention; wherein the ordinate represents the total benefit of the system and the abscissa represents the number of iterations. As shown in fig. 4, regardless of the Q-learning based method proposed in the embodiment of the present invention, i.e., RLBA, or GABA method, the total benefit of ME increases with the number of iterations until a relatively stable value is reached, which indicates that ME is converged. Under the condition of a small number of MEs, the convergence speed of the genetic algorithm is high and is basically within 200 iterations. It can be observed that once RLBA converges, a higher system benefit is always obtained than GABA.

FIG. 5 is a graph illustrating the total benefits for different ME amounts; wherein the ordinate represents the total benefit of the system and the abscissa represents the ME number. Fig. 5 shows the significant change in overall benefit and performance with increasing numbers of MEs. As the number of MEs increased from 4 to 8, RLBA, GABA and ROA remained on the trend of increasing continuously, FOA increased first and then decreased rapidly. The initial computational resources are sufficient for RLBA and GABA, so as the number of MEs increases, more tasks can be offloaded. However, when the number reaches 5, the overall benefit begins to change slowly. This is because the computing power of the MEC server is not infinite, it can only serve a certain number of MEs, the more tasks, the more decision choices, so the overall benefit growth rate starts to slow down. The reason for ROA is similar, but the overall benefit is relatively low because it has a small number of tasks to offload. For FOA, the reason for the initial rise in system revenue is the abundant resources. However, as migration tasks become more frequent, the incremental gains in saving time and effort will outweigh the increased migration costs. When there are 6 MEs in the system, the total revenue begins to drop at the same time. However, as the number of users increases, the migration cost of FOAs is still increasing. The profit due to time saving starts to decrease, so the total benefit starts to decrease.

FIG. 6 is a diagram of the total benefit for different MEC server computing resources; wherein the ordinate represents the total benefit of the system and the abscissa represents the computing power of the MEC server. The overall benefit with different edge computation power is shown in fig. 6. As can be seen from fig. 7, as the computing resource capacity of the MEC server increases, each algorithm can obtain higher benefits. However, their growth rates are obviously different depending on the number of offloaded and migrated tasks. According to the feedback of the environment, the RLBA and the GABA selectively share tasks through different strategies, and the aim is to reduce the migration cost. As such, RLBA and GABA achieve higher overall revenue than other schemes. Both algorithms improve efficiency by offloading as much as possible to the edges when the capacity of the MEC server is sufficient. The reason that the overall benefit of RLBA is higher compared to GABA is that the offloading decision of the Q-learning algorithm is more advantageous. Furthermore, the overall revenue of ROA grows slowly. The rate of rise of ROA is close to the two optimal algorithms, but the gain is slightly lower due to the uncertainty of the offloading task. The reasons for the FOA speed increase vary. FOAs continue to be offloaded to the edge, but as the computing power of MEC servers increases, more and more tasks no longer need to be migrated, and a reduction in migration costs can quickly increase overall efficiency.

FIG. 7 is a diagram illustrating the total system benefits for different migration costs; wherein the ordinate represents the total benefit of the system and the abscissa represents the migration cost. In fig. 7, the overall system benefit decreases as migration costs increase. Among them, RLBA and GABA decline slowly, i.e., the increase in migration cost has no significant effect on the system. It can be seen that the total yield of RLBA and GABA is higher than that of the other three schemes, RLBA is slightly better than GABA, which again verifies that reinforcement learning has superior performance compared with the traditional method. In the case of the ROA scheme, the ME for selecting the edge node calculation task is not much, and therefore the migration probability is relatively low. In the case of using FOA, most MEs choose an edge computing approach, which results in insufficient computing resources of the MEC server and an increase in the number of tasks that need to be migrated. As migration costs increase, the overall benefit decreases significantly.

Corresponding to the task processing method in the mobile edge computing MEC network provided in the foregoing embodiment, an embodiment of the present invention further provides a task processing device in the mobile edge computing MEC network, as shown in fig. 8, where the task processing device may include:

a first determining module 801, configured to determine current state information, where the current state information includes the number of mobile devices having a task to be processed in a system at a current time, task feature information of the task to be processed, and mobile feature information of each mobile device having the task to be processed;

a searching module 802, configured to search, from a pre-established Q table, an action and a system benefit corresponding to current state information, where the action includes that each mobile device locally processes a task to be processed that exists in itself or processes a task to be processed that exists in itself through an edge server; the pre-established Q table comprises actions corresponding to a plurality of state information and system benefits; the pre-established Q table is obtained by performing different actions under a plurality of state information and performing reinforcement learning iteration;

and the selecting module 803 is configured to select the action corresponding to the maximum system benefit as the target action corresponding to the current state information, so that each mobile device processes the task to be processed according to the target action.

Optionally, the apparatus further comprises:

a second determining module (not shown in the figure), configured to determine state information corresponding to a historical time, where the state information corresponding to the time includes the number of mobile devices having a task to be processed in the system at the time, task feature information of the task to be processed, and mobile feature information of each mobile device having the task to be processed;

a selecting module (not shown in the figure) for selecting an action corresponding to the state information according to a preset strategy for the state information;

a third determining module (not shown in the figure) for determining expected system benefits corresponding to the actions performed under the state information;

an updating module (not shown in the figure) for updating the expected system benefit corresponding to the state information and the action in the Q table to be established based on the determined expected system benefit;

a repeating module (not shown in the figure) for determining the state information of a plurality of historical moments respectively, and repeating the steps of the selecting module, the third determining module and the updating module until the expected system benefit for each state information and each action is converged to obtain an established Q table; and aiming at each piece of state information, the Q table comprises expected system benefits corresponding to the combination of each piece of state information and each action.

Optionally, an update module (not shown in the figure), specifically configured to pass a preset formula

Updating state information in a Q table to be established and expected system benefits corresponding to the actions;

Optionally, the system is a dual-layer cellular network, and the dual-layer cellular network includes: the mobile edge computing MEC system comprises a macro base station and a plurality of micro base stations, wherein each micro base station is respectively located at the center of a micro cellular network SCN, a mobile edge computing MEC server is deployed in the micro cellular network SCN where each micro base station is respectively located, and a plurality of mobile devices are randomly distributed in the coverage range of the plurality of SCNs.

Optionally, the task characteristic information includes a data amount of the task to be processed, a calculation resource amount required for completing the task to be processed, and a maximum time delay required to be satisfied; the mobile characteristic information comprises the stay time of the mobile equipment in a service coverage area, and the service coverage area is an area covered by a microcellular network SCN where the mobile equipment is located currently;

a third determination module (not shown), in particular for passing the expected utility function

Determining an expected system revenue;

wherein the content of the first and second substances,

in order to be a first utility function,

for the probability corresponding to the first utility function,

in order to be a function of the second utility function,

for the probability corresponding to the second utility function,

Optionally, a third determining module (not shown in the figure), configured to determine time saving, power consumption saving, and total cost, where the time saving is time saved by offloading the to-be-processed task from the mobile device to the edge computing server as compared with the time saved by the mobile device itself processing the to-be-processed task; the power saving is the local power saving of offloading the pending task from the mobile device to the MEC server compared to the mobile device processing the pending task; the total cost is the power consumption and resources required by the MEC server to process the tasks to be processed; determining a migration cost, wherein the migration cost represents the cost generated in a transmission process required by the mobile equipment to move from the current micro-cellular network SCN to another SCN before the MEC server completes the to-be-processed task, and the transmission process represents the process that the MEC server forwards a processing result obtained by processing the to-be-processed task to another SCN to which the mobile equipment moves through the macro base station and sends the processing result to the mobile equipment through the other SCN; calculating a difference value between the sum of time saving and power saving and the total cost, and taking the difference value as a first utility function; before the mobile equipment moves from the current microcellular network SCN to another SCN, the MEC server completes the utility function corresponding to the task to be processed; calculating the sum of time saving and power saving and the difference between the sum of the total cost and the migration cost, and taking the difference as a second utility function, wherein the second utility function represents a utility function corresponding to the fact that the mobile equipment moves from the current microcellular network SCN to another SCN before the MEC server completes the task to be processed; determining a first probability representing a probability that the MEC server has processed the pending task before the mobile device moves from the current microcellular network SCN to another SCN, and a second probability representing a probability that the mobile device has moved from the current microcellular network SCN to another SCN before the MEC server processes the pending task, taking a sum of a product of the first utility function and the first probability, and a product of the second utility function and the second probability as an expected utility function for processing the pending task.

Optionally, a third determining module (not shown in the figure), specifically for passing through the formula

Determining to save time;

wherein the content of the first and second substances,

to save time, θ_tIn order to save the time of the coefficient of return,

the time required for the mobile device to handle the pending task itself,

time consumed for offloading the pending task from the mobile device to the edge computing server, MEC, server;

by the formula

Determining to save power consumption;

wherein the content of the first and second substances,

energy consumption for the mobile device to process the task to be processed;

by the formula

Determining a total cost;

wherein the content of the first and second substances,

for the sake of the total cost,

energy consumption, θ, for MEC server processing_fPrice per unit of execution resources for allocated MEC servers, f_iTo allocate to ME_iThe edge computing resources of (1);

by the formula

Determining a first utility function

By the formula

Determining a second utility function

Wherein the content of the first and second substances,

is the migration cost.

The embodiment of the present invention further provides an electronic device, as shown in fig. 9, which includes a processor 901, a communication interface 902, a memory 903 and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904.

A memory 903 for storing computer programs;

the processor 901 is configured to implement the method steps of the task processing method in the mobile edge computing MEC network when executing the program stored in the memory 903.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In a further embodiment provided by the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, which, when being executed by a processor, implements the method steps of the task processing method in the mobile edge computing, MEC, network described above.

In a further embodiment provided by the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method steps of the above method for task handling in a mobile edge computing, MEC, network.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A task processing method in a Mobile Edge Computing (MEC) network is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of establishing the Q table comprises:

determining state information corresponding to a historical moment, wherein the state information corresponding to the historical moment comprises the number of mobile devices with tasks to be processed in the system, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed at the historical moment;

step 1, selecting an action corresponding to the state information according to a preset strategy aiming at the state information;

step 2, determining expected system benefits corresponding to the action executed under the state information;

step 3, updating the state information in the Q table to be established and the expected system income corresponding to the action based on the determined expected system income;

3. The method of claim 2, wherein updating the expected system revenue corresponding to the state information and the action in the Q-table to be built based on the determined expected system revenue and the reward value comprises:

by preset formulas

Updating the state information in the Q table to be established and the expected system income corresponding to the action;

wherein NewQ (s, a) is the updated expected system gain, Q (s, a) is the expected system gain obtained from the previous iteration, Q (s ', a') is the expected system gain determined from the current iteration, α and γ are preset parameters, α is greater than or equal to 0, γ is less than or equal to 1, and r is an incentive value.

4. The method of claim 2, wherein the system is a dual-tier cellular network, the dual-tier cellular network comprising: the mobile equipment comprises a macro base station and a plurality of micro base stations, wherein each micro base station is respectively positioned in the center of a micro cellular network SCN, an MEC server is deployed in the micro cellular network SCN where each micro base station is respectively positioned, and a plurality of mobile equipment are randomly distributed in the coverage range of the plurality of SCNs.

5. The method of claim 4, wherein the task characteristic information comprises a data amount of the task to be processed, an amount of computing resources required to complete the task to be processed, and a maximum time delay required to be satisfied; the mobile characteristic information comprises the stay time of the mobile equipment in a service coverage area, wherein the service coverage area is the area covered by the micro cellular network SCN where the mobile equipment is located currently;

the determining an expected system benefit corresponding to the performing the action under the state information includes:

by expected utility function

Determining an expected system revenue;

wherein the content of the first and second substances,

in order to be a first utility function,

for the probability corresponding to the first utility function,

in order to be a function of the second utility function,

for the probability corresponding to the second utility function,

the number of mobile devices for which tasks to be processed exist in the system.

6. The method of claim 5, wherein determining the expected utility function comprises:

determining a time savings, a power savings, and a total cost, wherein the time savings is a time saved in offloading the pending task from the mobile device to the MEC server as compared to the mobile device itself processing the pending task; the power savings is a local power savings in offloading the pending task from the mobile device to the MEC server as compared to the mobile device processing the pending task; the total cost is the power consumption and resources required by the MEC server to process the task to be processed;

determining a migration cost, wherein the migration cost represents a cost generated by a transmission process required by the mobile device to move from a current microcellular network (SCN) to another SCN before the MEC server processes and completes the task to be processed, and the transmission process represents a process that the MEC server forwards a processing result obtained by processing the task to be processed to the other SCN to which the mobile device moves through a macro base station and sends the processing result to the mobile device through the other SCN;

calculating a difference between the sum of the time saving and the power saving and the total cost, and taking the difference as a first utility function; wherein the first utility function represents a utility function corresponding to the fact that the MEC server has already processed and completed the task to be processed before the mobile device moves from a current microcellular network (SCN) to another SCN;

calculating a difference value between the sum of the time saving and the power saving and the sum of the total cost and the migration cost, and taking the difference value as a second utility function, wherein the second utility function represents a utility function corresponding to the mobile device moving from a current microcellular network (SCN) to another SCN before the MEC server completes the task to be processed;

determining a first probability representing a probability that the MEC server has processed the task to be processed before the mobile device moves from a current microcellular network SCN to another SCN, and a second probability representing a probability that the mobile device moves from the current microcellular network SCN to another SCN before the MEC server processes the task to be processed,

taking the sum of the product of the first utility function and the first probability and the product of the second utility function and the second probability as an expected utility function for processing the task to be processed.

7. The method of claim 6, wherein determining the time savings, the power consumption savings, and the total cost comprises:

by the formula

Determining the time savings;

wherein the content of the first and second substances,

for said saving of time, θ_tIn order to save the time of the coefficient of return,

the time required for the mobile device to process the pending task itself,

time consumed to offload pending tasks from the mobile device to the MEC server;

by the formula

Determining to save power consumption;

wherein the content of the first and second substances,

to said save power, θ_EIn order to save the revenue factor for local power consumption,

energy consumption resulting from processing the task to be processed for the mobile device itself;

by the formula

Determining a total cost;

wherein the content of the first and second substances,

for the sake of the total cost,

calculating a difference between the sum of the time savings and the power savings and the total cost, and using the difference as a first utility function, comprising:

by the formula

Determining a first utility function Q_i ¹；

Said calculating a difference between a sum of said time savings and said power savings and a sum of said total cost and said migration cost, and using said difference as a second utility function, comprising:

by the formula

Determining a second utility function

Wherein the content of the first and second substances,

is the migration cost.

8. A task processing apparatus in a mobile edge computing, MEC, network, comprising:

9. The apparatus of claim 8, further comprising:

the second determining module is used for determining state information corresponding to a historical moment, wherein the state information corresponding to the historical moment comprises the number of mobile devices with tasks to be processed in the system at the moment, task characteristic information of the tasks to be processed and the mobile characteristic information of each mobile device with the tasks to be processed;

the selection module is used for selecting the action corresponding to the state information according to a preset strategy aiming at the state information;

a third determining module for determining an expected system revenue corresponding to the action being performed under the state information;

the updating module is used for updating the state information in the Q table to be established and the expected system income corresponding to the action based on the determined expected system income;

the repeating module is used for respectively determining the state information of a plurality of historical moments, and repeatedly executing the steps of the selecting module, the third determining module and the updating module until the expected system income for each state information and each action is converged to obtain an established Q table; and aiming at each piece of state information, the Q table comprises expected system benefits corresponding to the combination of each piece of state information and each action.

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.