Task cache-based computation unloading method in edge computation
Technical Field
The invention relates to the application field of a mobile edge computing system, in particular to a task cache-based computing unloading method in edge computing.
Background
With the rapid development of wireless communication technology, internet of things technology and 5G technology, mobile devices are rapidly popularized, and data traffic is rapidly increased. Some emerging applications, such as online gaming, artificial intelligence, and virtual reality, have high latency requirements and require significant computing resources. However, mobile terminals have limited computing power and battery capacity, and running these applications can introduce higher computational delays and increase the power consumption of the mobile terminal. And the transmission delay caused by the long transmission distance of the service mode based on the cloud computing center is often difficult to meet the requirement of real-time application. The Edge Computing (MEC) paradigm arises from this, and by sinking the service to the Edge of the network, network transmission delay is reduced, and the requirements for low-delay services are met.
The edge computing not only solves the problem of insufficient resources such as computing and storage of the mobile equipment, but also solves the problems of overhigh transmission delay, overlarge server load and the like in the traditional cloud computing center mode. However, as the number of network users increases, people can have the same access request driven by the public mind, which causes the phenomenon that a large amount of content in the backbone network is repeatedly transmitted in a service period. If the hot content is cached at the edge of the network, the pressure of a backbone network can be reduced, the transmission delay can be reduced, and the user experience can be improved. Tan [ z.tan, f.r.yu, x.li, h.ji, and v.c.raw, "Virtual resource allocation for heterologous services in full duplex-enabled SCNs with mobile computing and caching," IEEE Transactions on cellular Technology, vol.67, No.2, pp.1794-1808,2017 ], et al propose a MEC offload and caching framework in full duplex Small Cell Networks (SCNs) to improve system yield by combining computational offload and content caching. Gu [ w. -c.chien, h. -y.weng, and c. -f.lai, "Q-learning based social cache allocation in mobile computing," Future Generation Computer Systems, vol.102, pp.603-610,2020 ] and the like propose, in order to improve content cache hit rate, an SDN-based MEC architecture, which first models a cache model, maximizes cache hit rate under the limitation of MEC storage resources, and solves a cache strategy using Q learning. However, the document content caching and task offloading are independent of each other. At present, few researches are carried out on actively caching task results, if some tasks with high popularity are actively cached in an edge server, when the tasks are requested by users again, calculation results can be directly transmitted to the users from the MEC, the calculation time delay of the tasks is greatly reduced, and the energy consumption of the users is reduced. In a real scene, there are many scenes in which the calculation result can be reused, for example, some game rendering scenes can be reused by players; in an AR scenario, some AR services are often requested repeatedly; in the video task, some hot videos can be decoded repeatedly, and if the hot calculation task result is cached, the calculation burden of the edge server can be relieved. Zhao et al [ h.zhao, y.wang, and r.sun, "Task active control Based provisioning and Resource Allocation in Mobile-Edge Computing Systems," in 201814 th International Wireless Communications & Mobile Computing Conference (IWCMC),2018, pp.232-237: IEEE ] in a single cell scenario, jointly optimize the Task cache Based Computation offload and Resource Allocation strategy in order to reduce the Task completion latency. Firstly, modeling an unloading, resource allocation and task caching model, taking the minimized completion time delay of all tasks as an optimization target, wherein the modeled time delay optimization problem belongs to an NP difficult problem and cannot be completed in polynomial time, and the model is disassembled into two parts for solving; and then designing a greedy algorithm based on improvement to solve the unloading strategy.
Most of the existing research on task caching designs a caching strategy from the perspective of task popularity, however, for some compute-intensive applications such as virtual reality, AR service, large games and the like, besides popularity, the amount of computation and the data size required for computing tasks will also affect the caching value of the tasks. For example, some tasks, although most popular, may be less computationally intensive to complete, whereas less computationally intensive computing tasks that are cached less popular may be more profitable. How to reasonably allocate the weight of each factor is also a considerable problem, and if the weight allocation is not reasonable, the cache value of the task is affected.
Disclosure of Invention
The invention aims to provide a task cache-based calculation unloading method in edge calculation aiming at the defects in the prior art. The method can effectively reduce the operation time of the algorithm, improve the response speed of the server, reduce the system energy consumption of repeated operation by the task cache mechanism, reduce the calculation cost of the server, and reduce the completion delay and the weighted sum of the energy consumption of all tasks.
The technical scheme for realizing the purpose of the invention is as follows:
a task cache-based computation unloading method in edge computation comprises the following steps under the condition of a single-cell scene: 1) constructing a system model: configuring an MEC server at a base station, wherein the MEC server is connected with a remote center cloud through an optical fiber, assuming that a plurality of mobile devices are randomly distributed in a cell, each user only generates one task in one service period or time slot, the number of the mobile user is expressed as M ∈ {1, 2.., M }, and the task T generated by the mobile device belongs to {1, 2., M }
i(i ∈ {1, 2...., N }) is represented as a quadruple, i.e., a square
Wherein
The input data volume of the task is represented, and the unit is kbit; c
iThe unit of the CPU periodicity required for completing the task is cycle;
the output data volume of the task is represented, namely the output size of the task calculation result is represented in a unit of kbit; tau is
iThe maximum tolerant time delay for completing the task is expressed in ms, the system model construction is completed, and the research work of the technical scheme mainly focuses on the mobile terminal and the edge server: the mobile terminal generates and runs the application program, and if the local situation can not satisfy the counting of the application programCalculating the demand, can propose the unloading request to the edge server through the mobile network, the edge server end receives the unloading request from the mobile terminal, and distribute the appropriate computational resource and guarantee the task can be finished within tolerating the time delay, and determine whether the task buffers according to the buffer value of the task;
2) constructing a system communication model: the MEC server may provide computing services to mobile devices within a cell, each user may only generate one compute-intensive task per time slot, and the user may choose whether to offload a task to MEC execution, let a ═ a1,a2,...,aNDenotes the set of decision actions for all MD, where ai0 represents MiThe task of (a) is performed at the local devicei1 denotes task offloading to MEC execution, assuming no intra-cell interference is present in the cell, according to shannon's formula, MiThe transmission rate for transmitting the task to the MEC server is as follows:
wherein g isiRepresenting a mobile device MiAnd channel gain, p, between MEC servicesiRepresents MiThe transmitting power of the system is N, the Gaussian noise power is N, and the unit is dbm, so that the system communication model is constructed;
3) constructing a system calculation model: in the process of processing tasks, a user sends an unloading request to an edge server, and the calculation tasks are unloaded only under the condition that the tolerance time delay of an application program is met and the system overhead can be reduced;
3.1) constructing a local calculation model: when task TiSelection being performed locally, the mobile device MiHas a CPU frequency of fi lIn GHz, task TiThe computation delay executed locally is Ti l:
Energy consumption
Calculated from the following formula:
where κ denotes a CPU power consumption coefficient, and is a fixed constant depending on chip processes, and κ is set to 10-26,
When computing task T
iWhen performed locally, the total overhead of the system includes the local computation delay and the terminal energy consumption, in which case the overhead of a single device
Comprises the following steps:
in the formula, α and β are weight coefficients of time delay and energy consumption, respectively, and satisfy the following conditions:
α+β=1,0≤α≤1,0≤β≤1;
3.2) constructing an MEC calculation model: when computing task TiSelecting offload to MEC execution, latency including slave mobile device MiThe transmission delay to and execution delay at the MEC, the computational power to assign the MEC to the user is denoted fi cThe delay incurred by offloading to MEC calculation includes the transmission of Ti tranAnd execution delay Ti execOffload, therefore, the total latency for task offload to MEC is denoted as Ti c:
At this time, the energy consumption generated by the mobile device unloading to the MEC is generated by the task unloading through the wireless link, and the transmission energy consumption is expressed as
The energy consumption of the MEC is independent of the user and is not considered as system overhead, so the total overhead of task offloading to the MEC server is:
completing the construction of a system computing model;
4) constructing a resource allocation model: under the tolerance time delay constraint of the task, the edge server distributes proper computing resources according to the task attribute, wherein f is f1,f2,...,fn]Expressed as an allocation vector of computational resources, where fiIs denoted as MiThe distributed computing resources finish the construction of a resource distribution model;
5) constructing a task value model and a cache model: expressing the cache vector of the MEC to the task content requested by the user as H ═ H1,h2,...,hnIn which h isiIs a binary variable indicating whether the MEC has cached the user MiTask and related data content, hi0 means that the MEC does not cache the content, hi1, the MEC already caches the content, if the MEC already caches the content, task unloading is not needed when the MEC requests the task, the MEC directly returns a result to the mobile device after completing the task, the task with higher caching value is stored in the MEC under the limitation of cost and storage space, the task with low caching value is replaced, and a popularity calculation formula of the task is calculated according to the Zipf distribution (zip distribution) of the popularity file as follows:
in the formula, thetaiDenotes the popularity of the ith task, Z denotes the ziff constant, where the task set cached by the MEC service is denoted as HcC is the maximum storage quantity and is initialized to be empty, and because the memory capacity of the MEC server is limited, the MEC server only caches the tasks with higher cache value, and finally outputs a task cache policy set H*The cache value is defined as follows:
wherein w1、w2And w3The weight coefficients are respectively the popularity of the task, the size of the task input data and the size of the calculated amount required by the task, and the following conditions are met:
w1+w2+w3=1,0≤w1≤1,0≤w2≤1,0≤w3≤1,
completing the construction of a system task value model and a cache model;
6) constructing a system overhead model: task completion delay and mobile terminal energy consumption are key indexes for calculating the quality of an unloading strategy, under the constraint of MEC computing capacity, storage resources and task tolerance delay, system overhead is minimized, and an overhead minimization problem is modeled as follows:
the optimization target of the technical scheme takes system benefit as a target to obtain the optimal unloading decision A*Computing resource allocation F*And an optimal caching strategy H*The system overhead is minimized, and therefore, the optimization problem is expressed as:
A={a1,a2,...,aN}
F={f1,f2,...,fN}
H={h1,h2,...,hN},
constraint conditions are as follows:
C5:τi≤τ,
the method comprises the following steps that A is an unloading strategy set, F is a calculation resource distribution strategy set, H is a task caching strategy set, a constraint condition C1 indicates that calculation tasks can only be executed in a local server or an MEC server, C2 and C3 indicate that the sum of calculation resources distributed by the MEC server cannot exceed the calculation capacity of the MEC server, C4 indicates that the total cache capacity cannot exceed the total storage space of the MEC server, C5 indicates time delay control, namely the completion time of the tasks cannot exceed the maximum time delay which can be tolerated by the tasks, and system overhead model construction is completed;
7) solving the problem: solving by adopting a Q-Learning algorithm and an analytic hierarchy process:
the task cache-based computational migration researched by the technical scheme is a typical multi-target combinatorial optimization problem, belongs to an NP (non-trivial) problem, and can be solved by a plurality of heuristic algorithms at present: for example, Game Theory algorithm (Game Theory), Genetic Algorithm (GA), ant colony algorithm (PSO), and the like, although these heuristic algorithms have good performance on their respective targets, such as reducing the delay cost or energy consumption cost of the user, tasks in these algorithms arrive randomly within a service period, these algorithms respond to the tasks and make decisions periodically, and in a real scene, the tasks arrive randomly and need real-time response, and in addition, the conventional algorithms need to perform a large number of iterations to obtain an optimal solution, resulting in a high running time cost;
in order to solve the problem, the technical scheme provides a reinforcement Learning method to solve the problems of task unloading and resource allocation, wherein the reinforcement Learning is to obtain an approximately optimal solution by continuous trial and error, mainly comprises four basic elements of a state, an action, a return and an agent (agent), and aims to maximize long-term benefits, and is divided into reinforcement Learning algorithms with a model and without a model;
an Analytic Hierarchy Process (AHP) is a qualitative and quantitative analysis decision model, which divides elements related to decision into a target layer, a criterion layer and a scheme layer, carries out quantitative analysis on importance of pairwise comparison of one-level elements, and reflects a weight coefficient of each element through calculation, thereby being very suitable for a task scheduling scene of weight distribution;
the technical scheme determines the task value from a plurality of angles, and the task with higher processing value has higher priority, the technical scheme mainly considers the value of the task from three aspects, namely the maximum tolerant time delay of the task, the calculation resource required by the task and the data volume of the task, and the technical scheme mainly aims at maximizing the completion rate of the task, so that the tolerant time delay of the task has the maximum weight, all the calculation capacity and the data volume of the task have certain influence on the completion rate of the task, and other factors have smaller influence on the overall value relative to the tolerant time delay of the task;
the technical scheme includes that a hierarchical structure model is constructed firstly, then factors of a criterion layer are compared pairwise, a judgment matrix of the criterion layer is constructed according to an objective judgment result, a characteristic vector, a characteristic root and a weighted value are calculated according to the judgment matrix, and finally the effectiveness of the judgment matrix is judged through consistency check analysis;
Q-Learning algorithm:
the system state is as follows: the system state s unloads the decision vector a, the computational resource allocation vector F, and the remaining computational resource vector G, i.e.:
S={A,F,G},
the system acts as follows: in the system, which tasks are unloaded and which are not unloaded are decided by the Agent, and how much computing resources are allocated to each task, so the system action is expressed as:
a={ai,fi},
wherein a isiRepresenting a task TiUnloading scheme of fiIs represented to task TiAn allocated computing resource;
and (3) system reward: in the t time slot, after the Agent executes each possible action, a reward R (S, a) is obtained in a certain state, the reward function should be associated with the target function, and the optimization problem at this time is to minimize the system overhead, which is specifically defined as follows:
wherein c islocalRepresenting the system total cost of executing the tasks locally at the time t, and c (s, a) representing the system total cost of the JORC algorithm in the current state;
in the Q-Learning algorithm, the agent observes the current environmental state s at time ttSelecting action a according to Q tabletPerforming action atThen enters the state st+1Obtaining the reward r, updating the Q table through the following formula, and continuously and circularly iterating until the Q value is converged to obtain the optimal strategy pi*:
Wherein, delta is the learning rate, and gamma (0 < gamma < 1) is the discount factor;
the Q-Learning algorithm is as follows:
inputting: training round number T, learning rate mu, discount factor gamma, greedy coefficient epsilon, task set N and MEC residual computing resources;
and (3) outputting: offload policy A*Resource allocation strategy F*
1. Initializing a Q matrix
2.Repeat:
3. At an initial state s, an action a is selected according to a greedy strategy
4.Repeat
5. Get a return r at s-select action a according to a greedy policy and enter the next state st+1
7.s=st+1
Until s is the termination State
Until Q value convergence;
analytic hierarchy process:
1. building a hierarchical model
The hierarchical model is 3 layers, the first layer is task priority P, the second layer is task maximum tolerance time delay tau, the size of task calculated quantity C and the size of task data quantity D, and the third layer is task T;
2. constructing a judgment matrix in the hierarchy by aijThe importance of the ith element and the jth element relative to a certain factor of the previous layer is shown, the criterion layer has 3 elements in total, and a judgment matrix A is constructed (a)ij)3×3,
Wherein:
three elements tau, C, D are all governed by P, and the judgment matrix is:
according to the judgment matrix, the maximum eigenvalue lambda of the matrix is solvedmaxAnd the corresponding feature vector W ═ W (W)1,w2,w3)T;
3. And (3) consistency check, namely calculating an inconsistency index CI of the judgment matrix A according to the evaluation index, and solving a random consistency ratio CR value according to the RI value:
when CR <0.1, proving that the judgment matrix has satisfactory consistency, otherwise, revising A again until the satisfactory consistency is reached, and finally obtaining a weight vector W of the calculation task, wherein the cache value of the task is expressed as follows:
wherein w1、w2And w3The weighting coefficients are respectively the popularity of the task, the CPU period size required by the task and the data volume size of the task, and the following conditions are met:
w1+w2+w3=10≤w1≤1,0≤w2≤1,0≤w3≤1,
the method includes that time delay and energy consumption cost of a system are minimized through a Joint optimization computation unloading, resource allocation and task Caching method (JORC for short), an MEC server detects whether tasks are cached or not, if the MEC caches task computation results, the MEC returns directly, whether an MEC cache task set is replaced or not is determined through the JORC method, and if the computation task results are not cached, an unloading strategy and a resource allocation strategy are determined through a Q-learning algorithm to minimize the system cost, wherein the JORC method is described as follows:
inputting: user request set N, server information G, cache status H,
and (3) outputting: h*,A*,F*
1.for i=1:N do
2. Mobile device i generates task TiAnd issues an offload request
3.if TiIn a cache set
4. Returning the calculation results
5.else
6. Adding tasks to a set of tasks M
7.end if
8. Inputting M into Q-learning algorithm to obtain A*,F*
If H has task cache value less than
10. Will TiReplacing the original result
11.end if
12. Output caching policy set H*
13.end if
And (5) solving the problem.
Compared with the existing research, the technical scheme has the following characteristics:
1. aiming at the task cache problem, the technical scheme comprehensively considers the task unloading, the bandwidth and the calculation resource allocation, and the task cache, and models the problems of task completion delay and energy consumption minimization under the constraint of calculation, wireless and MEC storage resources. Modeling the unloading process into an MDP model, and designing an algorithm based on Q-Learning to solve.
2. The task cache value is determined based on multiple attributes of the task, specifically, the task popularity, the size of the task input data and the size of the calculated amount required by the task are integrated, different weights are distributed to determine the task cache value, the task with higher cache value is prioritized, the server calculation cost can be better reduced, and the completion delay and the energy consumption weighted sum of all the tasks are reduced.
3. And (4) comprehensively considering various attributes of the tasks, and providing a weight distribution algorithm based on an analytic hierarchy process to distribute weights.
The method can effectively reduce the operation time of the algorithm, improve the response speed of the server, reduce the system energy consumption of repeated operation by the task cache mechanism, reduce the calculation cost of the server, and reduce the completion delay and the weighted sum of the energy consumption of all tasks.
Drawings
FIG. 1 is a frame diagram of an embodiment;
FIG. 2 is a schematic view of a hierarchy model of a hierarchy analysis method according to an embodiment.
Detailed Description
The invention is described in further detail below with reference to the following figures and specific examples, but the invention is not limited thereto.
Example (b):
the example considers the development of research for different types of virtual reality tasks in a single-cell scenario. A combined unloading and task caching strategy is provided, the strategy comprehensively considers the schemes of task unloading, bandwidth and computing resource allocation, firstly, unloading, resource allocation and task caching models are modeled, for task caching, caching values are defined according to various attributes of tasks, secondly, the problems of computing unloading and resource allocation are formalized and modeled into a Markov model, and the sum of time delay for completing the tasks and energy consumption weighting is minimized to serve as evaluation indexes.
Referring to fig. 1, a task cache-based computation offloading method in edge computation includes, in a single-cell scenario, the following steps:
1) constructing a system model: configuring an MEC server at a base station, wherein the MEC server is connected with a remote center cloud through an optical fiber, assuming that a plurality of mobile devices are randomly distributed in a cell, each user only generates one task in one service period or time slot, the number of the mobile user is expressed as M ∈ {1, 2.., M }, and the task T generated by the mobile device belongs to {1, 2., M }
i(i ∈ {1, 2...., N }) is represented as a quadruple, i.e., a square
Wherein
The input data volume of the task is represented, and the unit is kbit; c
iThe unit of the CPU periodicity required for completing the task is cycle;
the output data volume of the task is represented, namely the output size of the task calculation result is represented in a unit of kbit; tau is
iThe maximum tolerant time delay of the completed task is expressed, the unit is ms, and the system model is constructed;
2) constructing a system communication model: the MEC server may provide computing services to mobile devices within a cell, each user may only generate one compute-intensive task per time slot, and the user may choose whether to offload a task to MEC execution, let a ═ a1,a2,...,aNDenotes the set of decision actions for all MD, where ai0 represents MiThe task of (a) is performed at the local devicei1 denotes task offloading to MEC execution, assuming no intra-cell interference is present in the cell, according to shannon's formula, MiThe transmission rate for transmitting the task to the MEC server is as follows:
wherein g isiRepresenting a mobile device MiAnd channel gain, p, between MEC servicesiRepresents MiThe transmitting power of the system is N, the Gaussian noise power is N, and the unit is dbm, so that the system communication model is constructed;
3) constructing a system calculation model: in the method, the time delay and energy consumption of task processing are taken into comprehensive consideration as the system overhead, in the process of processing the tasks, a user sends an unloading request to an edge server, and the calculation tasks are unloaded only under the conditions that the time delay tolerance of an application program is met and the system overhead can be reduced;
3.1) constructing a local calculation model: when task TiSelection being performed locally, the mobile device MiHas a CPU frequency of fi lIn GHz, task TiThe computation delay executed locally is Ti l:
Energy consumption
Calculated from the following formula:
where κ denotes a CPU power consumption coefficient, and is a fixed constant depending on chip processes, and κ is set to 10-26,
When computing task T
iWhen performed locally, the total overhead of the system includes the local computation delay and the terminal energy consumption, in which case the overhead of a single device
Comprises the following steps:
in the formula, α and β are weight coefficients of time delay and energy consumption, respectively, and satisfy the following conditions:
α+β=1,0≤α≤1,0≤β≤1;
3.2) constructing an MEC calculation model: when computing task TiSelecting offload to MEC execution, latency including slave mobile device MiThe transmission delay to and execution delay at the MEC, the computational power to assign the MEC to the user is denoted fi cThe delay incurred by offloading to MEC calculation includes the transmission of Ti tranAnd execution delay Ti execUnloading, therefore task unloadingThe total delay from load to MEC is denoted as Ti c:
At this time, the energy consumption generated by the mobile device unloading to the MEC is generated by the task unloading through the wireless link, and the transmission energy consumption is expressed as
The energy consumption of the MEC is independent of the user and is not considered as system overhead, so the total overhead of task offloading to the MEC server is:
completing the construction of a system computing model;
4) constructing a resource allocation model: under the tolerance time delay constraint of the task, the edge server distributes proper computing resources according to the task attribute, wherein f is f1,f2,...,fn]Expressed as an allocation vector of computational resources, where fiIs denoted as MiThe distributed computing resources finish the construction of a resource distribution model;
5) constructing a task value model and a cache model: in the embodiment, three factors of task popularity, calculation amount required by the task and task data amount are comprehensively considered to determine the caching strategy, and the caching vector of the MEC for the task content requested by the user is expressed as H ═ H1,h2,...,hnIn which h isiIs a binary variable indicating whether the MEC has cached the user MiTask and related data content, hi0 means that the MEC does not cache the content, hi1 means that the MEC has cached the content, if the MEC has cached the contentIf the task is requested, the task is not required to be unloaded, after the task is completed, the MEC directly returns the result to the mobile device, the task with higher cache value is stored to the MEC under the limit of cost and storage space, the task with low cache value is replaced, and according to the Zipf distribution (zip distribution) of the popularity file, the popularity calculation formula of the task is calculated as follows:
in the formula, thetaiDenotes the popularity of the ith task, Z denotes the ziff constant, where the task set cached by the MEC service is denoted as HcC is the maximum storage quantity and is initialized to be empty, and because the memory capacity of the MEC server is limited, the MEC server only caches the tasks with higher cache value, and finally outputs a task cache policy set H*The cache value is defined as follows:
wherein w1、w2And w3The weight coefficients are respectively the popularity of the task, the size of the task input data and the size of the calculated amount required by the task, and the following conditions are met:
w1+w2+w3=1,0≤w1≤1,0≤w2≤1,0≤w3≤1,
completing the construction of a system task value model and a cache model;
6) constructing a system overhead model: task completion delay and mobile terminal energy consumption are both key indexes of the quality of a calculation unloading strategy, in the embodiment, the calculation unloading, resource allocation and task caching strategies are jointly optimized under the scene of a single MEC, under the constraint of the MEC calculation capacity, the storage resources and the task tolerance delay, the system overhead is minimized, and the problem of overhead minimization is modeled as follows:
the optimization target of the embodiment takes the system benefit as the target to obtain the optimal unloading decision A*Computing resource allocation F*And an optimal caching strategy H*The system overhead is minimized, and therefore, the optimization problem is expressed as:
A={a1,a2,...,aN}
F={f1,f2,...,fN}
H={h1,h2,...,hN},
constraint conditions are as follows:
C5:τi≤τ,
the method comprises the following steps that A is an unloading strategy set, F is a calculation resource distribution strategy set, H is a task caching strategy set, a constraint condition C1 indicates that calculation tasks can only be executed in a local server or an MEC server, C2 and C3 indicate that the sum of calculation resources distributed by the MEC server cannot exceed the calculation capacity of the MEC server, C4 indicates that the total cache capacity cannot exceed the total storage space of the MEC server, C5 indicates time delay control, namely the completion time of the tasks cannot exceed the maximum time delay which can be tolerated by the tasks, and system overhead model construction is completed;
7) solving the problem: solving by adopting a Q-Learning algorithm and an analytic hierarchy process:
the task cache-based computational migration researched in the embodiment is a typical multi-target combinatorial optimization problem, belongs to an NP (non-trivial) problem, and can be solved by a plurality of heuristic algorithms at present: for example, Game Theory algorithm (Game Theory), Genetic Algorithm (GA), ant colony algorithm (PSO), and the like, although these heuristic algorithms have good performance on their respective targets, such as reducing the delay cost or energy consumption cost of the user, tasks in these algorithms arrive randomly within a service period, these algorithms respond to the tasks and make decisions periodically, and in a real scene, the tasks arrive randomly and need real-time response, and in addition, the conventional algorithms need to perform a large number of iterations to obtain an optimal solution, resulting in a high running time cost;
in order to solve the problem, the embodiment provides a reinforcement Learning method to solve the problems of task unloading and resource allocation, wherein the reinforcement Learning is to obtain an approximately optimal solution by continuous trial and error, mainly comprises four basic elements of a state, an action, a return and an agent (agent), and aims to maximize long-term profit, and is divided into reinforcement Learning algorithms with a model and without a model, and the model cannot be established due to the fact that a real network scene is complex, so that the embodiment uses a Q-Learning algorithm;
the analytic hierarchy process is a qualitative and quantitative decision model, it will be with decision-making related element to divide into target layer, criterion layer and scheme layer, carry on the quantitative analysis to the importance that a level element compares pairwise, then reflect the weight coefficient of every element through calculating, very suitable for task scheduling scene of weight distribution;
the task value is determined from a plurality of angles, the task with higher processing value has higher priority, the task value is considered mainly from three aspects, namely the maximum tolerant delay of the task, the calculation resource required by the task and the data volume of the task, the completion rate of the task is maximized, the tolerant delay of the task has the maximum weight, all the calculation capacity and the data volume of the task have certain influence on the completion rate of the task, and other factors have small influence on the overall value relative to the tolerant delay of the task;
the technical scheme includes that a hierarchical structure model is constructed firstly, then factors of a criterion layer are compared pairwise, a judgment matrix of the criterion layer is constructed according to an objective judgment result, a characteristic vector, a characteristic root and a weighted value are calculated according to the judgment matrix, and finally the effectiveness of the judgment matrix is judged through consistency check analysis;
Q-Learning algorithm:
the system state is as follows: the system state s unloads the decision vector a, the computational resource allocation vector F, and the remaining computational resource vector G, i.e.:
S={A,F,G},
the system acts as follows: in the system, which tasks are unloaded and which are not unloaded are decided by the Agent, and how much computing resources are allocated to each task, so the system action is expressed as:
a={ai,fi},
wherein a isiRepresenting a task TiUnloading scheme of fiIs represented to task TiAn allocated computing resource;
and (3) system reward: in the t time slot, after the Agent executes each possible action, a reward R (S, a) is obtained in a certain state, the reward function should be associated with the target function, and the optimization problem at this time is to minimize the system overhead, which is specifically defined as follows:
wherein c islocalRepresenting the system total cost of executing the tasks locally at the time t, and c (s, a) representing the system total cost of the JORC algorithm in the current state;
in the Q-Learning algorithm, the agent observes the current environmental state s at time ttSelecting action a according to Q tabletPerforming action atThen enters the state st+1Obtaining the reward r, updating the Q table through the following formula, and continuously and circularly iterating until the Q value is converged to obtain the optimal strategy pi*:
Wherein, delta is the learning rate, and gamma (0 < gamma < 1) is the discount factor;
the Q-Learning algorithm is as follows:
inputting: training round number T, learning rate mu, discount factor gamma, greedy coefficient epsilon, task set N and MEC residual computing resources;
and (3) outputting: offload policy A*Resource allocation strategy F*
1. Initializing a Q matrix
2.Repeat:
3. At an initial state s, an action a is selected according to a greedy strategy
4.Repeat
5. Get a return r at s-select action a according to a greedy policy and enter the next state st+1
7.s=st+1
Until s is the termination State
Until Q value convergence;
analytic hierarchy process:
1. building a hierarchical model
The hierarchical model is 3 layers, the first layer is task priority P, the second layer is task maximum tolerance time delay tau, task calculated quantity C and task data quantity D, and the third layer is task T, as shown in FIG. 2;
2. constructing a judgment matrix in the hierarchy by aijThe importance of the ith element and the jth element relative to a certain factor of the previous layer is shown, the criterion layer has 3 elements in total, and a judgment matrix A is constructed (a)ij)3×3,
Wherein,
three elements tau, C, D are all governed by P, and the judgment matrix is:
according to the judgment matrix, the maximum eigenvalue lambda of the matrix is solvedmaxAnd the corresponding feature vector W ═ W (W)1,w2,w3)T;
3. And (3) consistency check, namely calculating an inconsistency index CI of the judgment matrix A according to the evaluation index, and solving a random consistency ratio CR value according to the RI value:
when CR <0.1, proving that the judgment matrix has satisfactory consistency, otherwise, revising A again until the satisfactory consistency is reached, and finally obtaining a weight vector W of the calculation task, wherein the cache value of the task is expressed as follows:
wherein w1、w2And w3The weighting coefficients are respectively the popularity of the task, the CPU period size required by the task and the data volume size of the task, and the following conditions are met:
w1+w2+w3=10≤w1≤1,0≤w2≤1,0≤w3≤1,
the method includes the steps that time delay and energy consumption cost of a system are minimized through a combined optimization calculation unloading, resource allocation and task caching method, an MEC server detects whether tasks are cached, if the MEC caches task calculation results, the tasks are directly returned, whether an MEC cache task set is replaced is determined through a JORC method, if the calculation task results are not cached, an unloading strategy and a resource allocation strategy are determined through a Q-learning algorithm, and system cost is minimized, wherein the specific JORC method is described as follows:
inputting: user request set N, server information G, cache status H,
and (3) outputting: h*,A*,F*
1.for i=1:N do
2. Mobile device i generates task TiAnd issues an offload request
3.if TiIn a cache set
4. Returning the calculation results
5.else
6. Adding tasks to a set of tasks M
7.end if
8. Inputting M into Q-learning algorithm to obtain A*,F*
If H has task cache value less than
10. Will TiReplacing the original result
11.end if
12. Output caching policy set H*
13.end if
And (5) solving the problem.