CN115220818A - Real-time dependency task unloading method based on deep reinforcement learning - Google Patents

Real-time dependency task unloading method based on deep reinforcement learning Download PDF

Info

Publication number
CN115220818A
CN115220818A CN202210937248.5A CN202210937248A CN115220818A CN 115220818 A CN115220818 A CN 115220818A CN 202210937248 A CN202210937248 A CN 202210937248A CN 115220818 A CN115220818 A CN 115220818A
Authority
CN
China
Prior art keywords
subtask
task
unloading
time
dep
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210937248.5A
Other languages
Chinese (zh)
Inventor
陈星�
胡晟熙
姚泽玮
林潮伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202210937248.5A priority Critical patent/CN115220818A/en
Publication of CN115220818A publication Critical patent/CN115220818A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44594Unloading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a real-time dependent task unloading method based on deep reinforcement learning, which comprises the following steps: s1, training an unloading operation Q value prediction model by using a DQN algorithm in a runtime environment based on a task unloading system model; s2, an unloading operation Q value prediction model predicts Q values of different unloading operations according to the computing capacity of the computing nodes, the transmission rate among the computing nodes and an applied unloading scheme, and then selects proper unloading operation by comparing the corresponding Q values; and S3, repeating the step S2, and gradually determining an execution position for each task through feedback iteration. The method can be well adapted to different cloud edge environments, and the unloading scheme can be generated efficiently.

Description

Real-time dependency task unloading method based on deep reinforcement learning
Technical Field
The invention belongs to the field of cloud-edge cooperative computing, and particularly relates to a real-time dependent task unloading method based on deep reinforcement learning.
Background
With the rise of intelligent technologies, more and more computation-intensive applications (such as automatic driving, face recognition, augmented reality, etc.) are developed to meet the needs of people, and computing platforms of the applications are not limited to smart phones and notebook computers, and have gradually been extended to smart devices such as wearable devices, vehicles, unmanned aerial vehicles, etc. Although these mobile devices are becoming more powerful, they are still limited in terms of processing power, memory capacity, and battery capacity due to size and weight constraints, and most mobile devices are still unable to handle the emerging variety of computationally intensive tasks in a short amount of time.
Computational offloading is an effective way to solve the problem of mobile device resource limitations, sending computationally intensive tasks in software from local to remote devices for execution, using remote resources to extend local resources. Computing resources are dispersed in the mobile equipment, the edge server and the cloud server, and the amount of the resources owned by each computing platform is obvious in difference; for an application, the computing resources that an application can acquire are often spread across multiple different computing platforms and dynamically change as the location changes. The application unloading scheme determines which computing tasks of the application should be run on which computing platform, but any unloading scheme is not an ultimate scheme, and when the running environment of the application program changes, the unloading scheme also needs to be adjusted along with the change of the running environment of the application program to provide better performance; therefore, there is a need to decide at runtime whether and to which computing platforms a computing task is offloaded. Most of the existing research works adopt heuristic algorithms or search algorithms to find suitable unloading schemes, and the suitable unloading schemes can be found only after tens of seconds or even hundreds of seconds. Considering the mobility of mobile devices, application running environment changes will occur frequently, and selecting a suitable offloading scheme faces the problem of combinatorial explosion, so that it is still a very challenging research to find an efficient offloading scheme generation method to meet the real-time requirement of computational offloading.
Disclosure of Invention
In view of this, the present invention provides a real-time dependent task offloading method based on deep reinforcement learning, which implements and efficiently generates an offloading scheme to meet a real-time requirement of computation offloading.
In order to realize the purpose, the invention adopts the following technical scheme:
a real-time dependent task unloading method based on deep reinforcement learning comprises the following steps:
s1, training an unloading operation Q value prediction model by using a DQN algorithm in a runtime environment based on a task unloading system model;
s2, an unloading operation Q value prediction model predicts Q values of different unloading operations according to the computing capacity of the computing nodes, the transmission rate among the computing nodes and an applied unloading scheme, and then selects proper unloading operation by comparing the corresponding Q values;
and S3, repeating the step S2, and gradually determining an execution position for each task through feedback iteration.
Further, the task unloading system model includes a system model and a task model, and specifically includes:
the system model comprises a mobile device MD, an edge server ES and a cloud server CS, a set of computing nodes is represented by V = { MD, ES, CS }, and the computing capacity of each computing node is represented by f k (k ∈ V); data transfer rate between different computing nodes by v k,l (k, l ∈ V);
the task model specifically comprises: an application is represented by a directed acyclic graph G = (N, E), wherein N = {1, 2.., N } represents a set of subtasks, N is the number of subtasks, and the calculation amount of each task is c i (i ∈ N); e = { E = i,j I, j belongs to N, i < j represents a dependent directed edge set among subtasks, and for one e i,j The directed edge belonging to E is called that the subtask i is a direct predecessor task of the subtask j, and the subtask j is a direct successor task of the subtask i; in addition, each e i,j E.g. directed edge and weight d of E i,j Is associated with d i,j Represents the amount of data transferred from subtask i to subtask j; the direct predecessor task set and the direct successor task set of the subtask i are represented by pre (i) and suc (i), and a subtask can only start to execute after receiving the processing results of all its predecessor tasks.
Further, a binary variable x is defined ik To indicate an offloading scheme, if x ik =1 denotes the assignment of a subtask i to a computing node k, whereas x ik =0; since each subtask can only be assigned to one compute node in the network, there is the following definition:
Figure BDA0003784072130000031
in addition, any subtask j epsilon N can be executed only when two conditions are met; firstly, the distributed computing node is available, namely no other subtask is executed on the computing node currently; available time of node assigned by subtask j
Figure BDA0003784072130000032
The following constraints should be satisfied:
Figure BDA0003784072130000033
wherein
Figure BDA0003784072130000034
Is the completion time of subtask i;
second, the subtask j should be ready, i.e. it has received the processing results of all predecessor subtasks, task j ready time
Figure BDA0003784072130000041
Is defined as:
Figure BDA0003784072130000042
if a subtask j and its one predecessor subtask i ∈ P (j) are assigned to different compute nodes k and l, respectively, the communication delay needs to be taken into account
Figure BDA0003784072130000043
In this case, the second term to the right of the constraint will be zero;
considering the above two conditions together, the start time of the subtask j is defined as:
Figure BDA0003784072130000044
the end time of subtask j is defined as:
Figure BDA0003784072130000045
by D 1:t Representing all subtask sets successfully completed at the t-th time step;
cumulative execution delay T of application program at T time step 1:t Is defined as:
Figure BDA0003784072130000046
an application is considered complete if and only if all its n subtasks are successfully completed, when all n tasks of an application are successfully completed, D is then performed 1:n = {1,2,. Eta, n }; thus, the total execution delay T of the application 1:n Calculated by the following formula:
Figure BDA0003784072130000047
for an application with n tasks, representing the unloading scheme of the application by DEP = (DEP (1), DEP (2),.. DEP (n)); the dep (i) belongs to {1,2,3} represents the execution position of the task i belongs to N, namely the task i belongs to the terminal equipment, the edge server and the cloud server respectively;
the objective function is defined as:
Minimize T 1:n
further, the DQN algorithm obtains a state s in a cloud edge environment during operation, then selects an action a through an epsilon-greedy strategy, and then receives an award value r and a next state s' obtained after the environment executes the action a; then the DQN algorithm stores (s, a, r, s') obtained in each step into an experience storage pool; generally, the capacity of the empirical storage pool in the DQN algorithm is preset, and when a storage threshold is reached, the neural network parameters are updated, and the loss function of the neural network is as follows:
Loss=(r+γmaxQ(s′,a′;ω′)-Q(s,a;ω)) 2
wherein γ is a discount factor; q (s, a; ω) is the output of "EvalNet", which calculates the Q value of the current state action pair, ω being the DNN weight of "EvalNet"; maxQ (s ', a'; ω ') is the output of "TargetNet", which calculates the maximum Q value when the next state s' performs action a ', ω' being the DNN weight of "TargetNet".
Further, the step S2 specifically includes:
first, the weight ω of the EvalNet neural network and the weight ω' = ω of the TargetNet neural network are randomly initialized (row 3);
for each training period, the current offload scheme DEP cur The current state s and the current response time T are initialized;
in the algorithm training process, the execution position of each subtask is sequentially determined through an epsilon-greedy strategy, one of all unloading schemes is randomly selected according to the probability of epsilon, and the unloading scheme with the maximum Q value in EvalNet is selected according to the probability of 1-epsilon;
next, performing action a and obtaining a new response time T ', calculating the reward r and updating the current response time T, observing the next state s';
then, putting (s, a, r, s') into an experience storage pool, randomly extracting m samples from the experience storage pool, and calculating a target value;
next, obtaining Loss according to a mean square error Loss function and updating the weight ω of the EvalNet neural network by using an Adam optimizer, and updating the weight ω' = ω of the TargetNet neural network after reaching the set C iteration;
and finally, updating the current state.
Compared with the prior art, the invention has the following beneficial effects:
the method can be well adapted to different cloud edge environments, and the unloading scheme can be generated efficiently.
Drawings
FIG. 1 is a system model for task offloading in one embodiment of the invention;
FIG. 2 is an example of a task offloading process in one embodiment of the invention;
FIG. 3 is a DQN algorithm architecture diagram in an embodiment of the invention;
FIG. 4 is a diagram of DAGs with different task sizes in an embodiment of the invention;
FIG. 5 is a comparison of DODQ and ideal performance under different scenarios in an embodiment of the invention;
FIG. 6 illustrates the accuracy of the unloading operation at different task sizes for DODQ according to an embodiment of the invention;
fig. 7 is a comparison of the performance of the DODQ in different scenarios with other classical methods in an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
In this embodiment, referring to fig. 1, the system model for task offloading is composed of a Mobile Device (MD), an Edge Server (ES), and a Cloud Server (CS). Denote the set of compute nodes by V = { MD, ES, CS }, the computing power of each compute node by f k And (k. Epsilon. V). Typically, the mobile device is the weakest computing power due to the constraints on size and weight of the mobile device, and the edge server is more powerful than the terminal device but less powerful than the remote cloud server. Data transfer rate between different computing nodes by v k,l (k, l ∈ V) indicates that the clothing is due to the edgeThe server is deployed at a location close to the mobile device, and the remote cloud server and the mobile device are located far away, so that the data transmission rate between the mobile device and the edge server is faster than the data transmission rate between the remote cloud server and the mobile device and the edge server.
As shown in fig. 1, an application can be represented by a directed acyclic graph G = (N, E), where N = {1,2,. Multidot., N } represents a set of subtasks, N is the number of subtasks, and the amount of computation per task is represented by c i (i ∈ N); e = { E = { E) i,j I, j belongs to N, i < j represents a dependent directed edge set among subtasks, and for one e i,j Let us say that subtask i is a direct predecessor task of subtask j, and subtask j is a direct successor task of subtask i. In addition, each e i,j E.g. directed edge and weight d of E i,j Is associated with d i,j Representing the amount of data transferred from subtask i to subtask j. We denote the set of immediate predecessor tasks and the set of immediate successor tasks of subtask i by pre (i) and suc (i), and a subtask can only start executing after receiving the processing results of all its predecessor tasks. For example, the set of directly preceding tasks and the set of directly succeeding tasks of the subtask are pre (9) = {2,4,5} and suc (9) = {10}, respectively, so the subtask 9 must receive the processing results of the subtasks 2,4, and 5 before starting execution. Furthermore, for a subtask 10 that has no immediate successor, we call the subtask 10 the end task.
In this embodiment, we can process their tasks locally, or offload the subtasks to an edge server or a remote cloud server for execution. Thus, we define a binary variable x ik To indicate an offloading scheme, if x ik =1 indicating the assignment of a subtask i to a computing node k, whereas x ik =0. Since each subtask can only be assigned to one compute node in the network, there is the following definition:
Figure BDA0003784072130000081
in addition, two conditions need to be satisfied for any subtask j ∈ N to start execution. First, the allocated compute node is available, i.e., no other subtasks are currently executing on the compute node. Available time of node assigned by subtask j
Figure BDA0003784072130000082
The following constraints should be satisfied:
Figure BDA0003784072130000083
wherein
Figure BDA0003784072130000084
Is the completion time of subtask i.
Second, the sub-task j should be ready, i.e. it has received the processing results of all predecessor sub-tasks. Task j ready time
Figure BDA0003784072130000085
Is defined as:
Figure BDA0003784072130000086
if a subtask j and its one predecessor subtask i ∈ P (j) are assigned to different compute nodes k and l, respectively, we need to consider the communication delay
Figure BDA0003784072130000087
Otherwise, data transfer between them can be realized by sharing the memory without communication delay. In this case, the second term to the right of the constraint will be zero.
Considering the above two conditions together, the start time of the subtask j is defined as:
Figure BDA0003784072130000088
further, the end time of the subtask j is defined as:
Figure BDA0003784072130000089
notably, the tasks in the directed acyclic task graph (DAG) are executed in parallel, and thus the delay of the application will be equal to the longest time it takes to complete the tasks in the task dependency chain. By D 1:t Representing the set of all subtasks that completed successfully at the t-th time step. Further, the cumulative execution delay T of the application at the T-th time step 1:t Can be defined as:
Figure BDA0003784072130000091
an application is considered complete if and only if all its n subtasks are successfully completed, when all n tasks of an application are successfully completed, D is then performed 1:n =1, 2,. Eta, n. Thus, the total execution delay T of the application 1:n Can be calculated by the following formula:
Figure BDA0003784072130000092
for an application with n tasks, the offload scheme of the application is represented by DEP = (DEP (1), DEP (2),.. DEP (n)). Where dep (i) e {1,2,3} represents the execution location of task i e N, i.e. end device, edge server, and cloud server, respectively. Since the mobile application is deployed on the terminal device, it is assumed that the 1 st task of the application is executed on the mobile device, i.e., dep (1) =1.
Different offload schemes DEP may result in T 1:n In order to achieve a better computation offload effect in the cloud-side collaborative environment, the optimization goal is to minimize T as much as possible 1:n . Thus, the objective function is defined as:
Minimize T 1:n
based on the above definition, the MEC runtime environment is composed of 2-tuples<F,V>And (4) forming. As shown in table 1.1, wherein F = (F) MD ,f ES ,f CS ) Representing the computing power of different computing nodes. V = (V) MD,ES ,v MD,CS ,v ES,CS ) Representing the data transfer rate between the compute nodes. DEP = (DEP (1), DEP (2),.. DEP (n)) represents an unloading scheme of an application, where DEP (i) represents an execution position of an ith task. T is a unit of 1:n Represents the application response time, T, under the corresponding offload scenario 1:n Smaller means better computational offload.
TABLE 1.1 runtime Environment and application offload schemes including Performance indicators
Figure BDA0003784072130000093
In this embodiment, a method for Offloading a real-time dependent task with Deep Qnetworks (DODQ) based on Deep reinforcement learning is provided, including the following steps:
step S1: training an unloading operation Q value prediction model by using a DQN algorithm in a runtime environment; the training data includes the computational power F of the compute nodes, the data transfer rate V between the compute nodes, the offload scheme of the application and the corresponding application delay, as shown in table 3. The unloading operation Q value prediction model can evaluate the Q values of unloading operations in different operating environments, so that when the current system state (the computing capacity of the computing nodes, the data transmission rate among the computing nodes and the unloading scheme of the application) is input, the model can accurately predict the Q values of different unloading operations;
step S2: the unloading operation Q value prediction model predicts the Q values of different unloading operations according to the computing capacity of the computing nodes, the transmission rate among the computing nodes and the applied unloading scheme, and then selects proper unloading operation by comparing the corresponding Q values;
and step S3: and repeating the step S2, and gradually determining the execution position for each task through feedback iteration.
In this example, DEP is used t =(dep t (1),dep t (2),...,dep t (N)) represents the offload scenario applied at time step t, where dept (i) e {0,1,2,3} (i e N) corresponds to the execution of task i at that time. In particular, when dep t (i) And =0, it means that the task i is not executed at time step t. At this time, DEP is shown in FIG. 2 3 =1, 2,0, indicating that at the 3 rd time step, task 1 is executed at the terminal device, tasks 2 and 3 are executed at the edge server, and the subsequent tasks are not executed yet.
Table 2.1 shows the application (n = 10) offloading process in a certain scenario. First, the 1 st task of an application is executed on a mobile device, at which time the corresponding DEP is executed 1 ={1,0,0,0,0,0,0,0,0,0},T 1:1 =0.069s. Next, task 2 is offloaded to the edge server for execution, at which time the corresponding DEP 2 ={1,2,0,0,0,0,0,0,0,0},T 1:2 =0.183s. Similarly, the execution position of each task is determined in turn. Finally, when DEP 1:10 When the value of {1,2, 1,3, 2} indicates that all tasks of the application have been executed, so we can obtain the response time T of the application 1:n It was 0.54s.
TABLE 2.1 application uninstallation procedure under certain scenarios
Figure BDA0003784072130000111
Reinforcement learning is an algorithmic model that can make optimal decisions by self-learning in a particular scenario, and it models all real-world problems by abstracting them into the interactive process between agents and the environment. At each time step of the interactive process, the agent receives the state of the environment and selects a corresponding response action. Then, at the next time step, the agent obtains a reward value and a new state based on the feedback from the environment. Since the goal of reinforcement learning is to maximize the cumulative reward, it is typically modeled using a Markov Decision Process (MDP). More specifically, the MDP may be defined as a 4-tuple, denoted as < S, A, T, R >, where S is the state space, A is the action space, T is the state transition function, and R is the reward function.
Based on the problem definition, in the proposed computational offload problem, the corresponding state space S, action space a, state transition function T and reward function R are defined as follows:
state space: the state space is denoted S, where S ∈ S denotes a potential state. To comprehensively consider the characteristics of the operating environment and the unloading scheme, s is defined as a triple<F,V,DEP cur >. Thus, s represents the current system state of the runtime environment, i.e., the current system state consists of the computing power of the different computing nodes, the data transfer rate between the computing nodes, and the currently applied offload scheme.
An action space: motion space is represented as a = { a = 1 ,a 2 ,a 3 And one action a (a belongs to A) determines the execution position of the task to be decided currently. Specifically, a ∈ { a } 1 ,a 2 ,a 3 And respectively executing the current task at the terminal equipment, unloading the current task to the edge server for execution and unloading the current task to the cloud server for execution.
State transition function: the state transition function is denoted as T (s, a), and the function return value is the next state to perform action a in state s. For example, as shown in table 2.1, when in state s =<F,V,(1,0,0,...,0)>Performing action a 2 Next state s' =<F,V,(1,2,0,...,0)>It can be observed that task 2 is offloaded to the edge server for execution.
The reward function: to guide the RL agent to minimize the response time of the application, the reward function is defined as:
R(s,a)=T-T′
r (s, a) represents the reward received by the RL agent after performing action a in the current state s, where T corresponds to the application delay under the current offload scenario, and T' represents the application delay after performing action a. For example, as shown in table 2.1, task 2 is offloaded to the reward R = R (s, a) = T executed by the edge server-T′=T 1:1 -T 1:2 =-0.114。
Referring to fig. 3, the DQN algorithm architecture diagram is shown, where a DQN agent obtains a state S (S e S) in a runtime cloud-edge environment, selects an action a (a e a) through an e-greedy policy, and then receives an award value r and a next state S' obtained after the environment executes the action a. The DQN algorithm then deposits (s, a, r, s') from each step into an empirical storage pool. Generally, the capacity of the empirical storage pool in the DQN algorithm is preset, and when a storage threshold is reached, the neural network parameters are updated, and the loss function of the neural network is as follows:
Loss=(r+γmaxQ(s′,a′;ω′)-Q(s,a;ω)) 2
wherein γ is a discount factor; q (s, a; ω) is the output of "EvalNet", which calculates the Q of the current state action pair, ω being the DNN weight of "EvalNet"; maxQ (s ', a'; ω ') is the output of "TargetNet", which calculates the maximum Q value when the next state s' performs action a ', ω' being the DNN weight of "TargetNet".
Figure BDA0003784072130000131
Based on the above definition, the DQN algorithm is used to evaluate the Q-values for different offloading operations. The key steps of the algorithm are as shown in algorithm 1, and first, the weight ω of the EvalNet neural network and the weight ω' = ω of the TargetNet neural network are randomly initialized (row 3). For each training period, the current offload scheme DEP cur Current state s and current response time T are initialized (lines 5-6). In the algorithm training process, the execution position of each subtask is determined in turn through an epsilon-greedy strategy, one of all unloading schemes is randomly selected according to the probability of epsilon, and the unloading scheme with the maximum Q value in EvalNet is selected according to the probability of 1-epsilon (line 8). Next, action a is performed and a new response time T 'is obtained, the reward r is calculated and the current response time T is updated, the next state s' is observed (line 9). Then, (s, a, r, s') is put into an experience storage pool, m samples in the experience pool are randomly extracted, and a target value is calculatedThese samples may come from different cloud-side operating environments to ensure the sufficiency of learning (lines 10-11). Next, lose was derived from the mean square error Loss function and the weight ω of the EvalNet neural network was updated using Adam optimizer and the weight ω' = ω of the TargetNet neural network was updated after the set C round of iteration was reached (lines 12-14). Finally, the current state is updated (line 15).
2.2 runtime decisions for offload operations
Figure BDA0003784072130000141
Figure BDA0003784072130000151
The decision process for the offloading operation is performed at run-time, with the main steps given in algorithm 2. For each task of the mobile application, the current offload scenario is first initialized (lines 9-10). Next, the Q value of each unload operation is evaluated by calling a Q value prediction model (lines 12-14). Finally, the unload operation with the largest Q value is selected and the unload scheme is updated (lines 15-17). Thus, in the decision making process, the execution position of each task is determined in turn.
By iterating the feedback control process, an optimal offload scenario may be performed step-by-step in a runtime cloud edge environment. The feedback control operation will continue until all tasks of the mobile application are completed.
Example 1:
in this embodiment, in order to simulate the diversity of the application programs, four task graphs with the task scale n = {10,15,20,25} are constructed, and the structure of the constructed task graph is shown in fig. 4.
For each application G = (N, E), the calculated amount of each sub-task c i (i e N) obeys [50, 500 ]]Even distribution within the Mcycle. And for each e i,j E, the data transmission amount from the subtask i to the subtask j obeys 0,1000]uniform distribution within KB. Furthermore, the computing power F (F) of the different computing nodes MD ,f ES ,f CS ) And a data transfer rate V (V) between the computing nodes MD,ES ,v MD,CS ,b ES,CS ) Uniform distribution is obeyed. Table 3.1 lists the detailed settings of the simulation parameters.
TABLE 3.1 setting of simulation parameters
Figure BDA0003784072130000152
Figure BDA0003784072130000161
The proposed DODQ is implemented based on TensorFlow 2.3.0. In DODQ, a fully-connected DNN is used that consists of one input layer, two hidden layers, and one output layer, where both hidden layers have 128 hidden neurons. The memory pool capacity M, the training batch M, the discount factor gamma, and the learning rate of the Adam optimizer are set to 15000, 64, 0.9, and 0.001, respectively. Further, different training periods are set for task graphs of different sizes, and training periods of task sizes n =10, 15,20,25 are set to 10000, 15000, 20000, 25000, respectively.
Based on the above settings, 10 different F, V cloud edge environment scenarios were simulated, as shown in table 3.2. And executing a runtime decision algorithm for each scene to realize the self-adaptive task unloading in different cloud edge environments.
TABLE 3.2 different scenarios for cloud edge Environment
Figure BDA0003784072130000162
In this embodiment, the effectiveness of the DODQ for adaptive task offloading was evaluated under different scenarios described in table 3.2. In particular, we will compare the performance of DODQ with the ideal scheme under these scenarios. By combining management experience and local verification, different scenes can be obtainedThe ideal unloading scheme with the shortest response time is obtained. However, finding an ideal solution is not feasible in practice, since it would need to exhaust all possibilities and therefore would be unacceptably complex. For example, for an application with n tasks, after the terminal device starts the application (i.e. the 1 st task is executed on the terminal device), we need to determine the execution positions for the remaining n-1 tasks (i.e. execute the task on the terminal device, or unload the task to the edge server or cloud server for execution), so we must count up to 3 in given F, V n-1 Each offload profile is tested for performance. As shown in fig. 5, in different scenarios, DODQ achieves a response time similar to the ideal scheme. When n =10, the DODQ achieves the best performance in scenarios 1, 5, 6, and 8. In other cases, the performance gap between DODQ and the ideal solution remains below 4% at all times. The results verify that the DODQ realizes the optimal/near-optimal task unloading performance in different cloud edge environments.
Taking scenario 1 (n = 10) as an example, a task offload process using DODQ is explained. As shown in Table 3.3, when the initialization offload scenario is DEP cur = (1,0,0,0,0,0,0,0,0,0,0), the predicted Q value for unload operation a =2 is higher than for other unload operations. Thus, the task currently to be decided (i.e. the second subtask) is offloaded to the edge server for execution, hence the offload scheme DEP cur <xnotran> (1,2,0,0,0,0,0,0,0). </xnotran> Next, the unload operation with the highest predicted Q value is selected and executed in each step. Finally, when DEP cur In case of =1,2,2,1,1,3,2,2,2, the unloading operation no longer needs to be performed, and the decision process of task unloading is completed.
Table 3.3 calculation offload procedure in scenario 1 (n = 10)
Figure BDA0003784072130000171
Figure BDA0003784072130000181
In this embodiment, the Action Accuracy Rate (AAR) is used to measure the correctness of the offload operation in the decision process, and is defined as:
Figure BDA0003784072130000182
where O and A are the number of all the offload operations that need to be performed and the correct offload operation, respectively. They are considered correct when the unloading operation brings the current unloading scenario close to the ideal scenario.
As shown in fig. 6, the DODQ can make an unloading operation decision with high accuracy under a Q value prediction model based on DQN. Specifically, the average AAR can reach 94.8% at different task scales.
In the present embodiment, the performance of DODQ is compared to Rule-based, machine-learning (ML-based), and Q-learning methods to further evaluate the advantage of DODQ in task offloading. Rule-based selects the best DAG partition point and offloads the remaining tasks from the mobile device to the cloud. And the ML-based method adopts particle swarm optimization and genetic algorithm (PSO-GA) to search the unloading scheme according to the predicted response time of the application program under different environments. Q-learning stores each state action pair and its corresponding Q value in a Q table to maximize the cumulative return of offloading schemes, which can handle offloading problems with small scale state space well. When the runtime environment changes, Q-learning needs to retrain the task-offloaded decision model to better adapt to the new environment.
As shown in FIG. 7, the DODQ response time is 13-30% better than Rule-based under different scenarios. Rule-based involves segmentation rules set by experts, but these predefined rules do not apply well to different scenarios. Therefore, the rule-based method cannot effectively adapt to the task unloading problem in the dynamic cloud edge environment. Especially when the environment is complex (e.g., n = 25), the advantage of DODQ compared to Rule-based becomes very significant. In addition, compared with ML-based, the performance of DODQ is improved by 6% -8%. In particular, we evaluated the prediction accuracy of response times in ML-based, with only 72.5% accuracy, allowing 15% model error. This is because ML-based methods require enough training data to develop an accurate prediction model. However, in the absence of training data support, task offloading is inefficient due to inaccurate predictions. Furthermore, Q-learning achieves a response time comparable to DODQ in different scenarios without considering training time. However, when the runtime environment changes, Q-learning needs to retrain the decision model for task offloading for the new environment, which results in too long training time.
Furthermore, we evaluated the convergence times of the DODQ, rule-based, ML-based and Q-learning methods with different task scales, and the results are shown in Table 3.4. In particular, there is no training time using Rule-based, since the rules are predefined. However, rule-based has the worst performance in reducing response time, as shown in fig. 7. Among these methods, the ML-based search for the offload solution using DNN requires a large amount of offload solution performance prediction resulting in the longest convergence time. In contrast, DODQ can adapt well to dynamic cloud edge environments and generate optimal/near optimal offload scenarios in milliseconds. In addition, Q-learning also has a high dimensional state space problem, especially when the task size becomes large, because it records all state-action pairs and their corresponding Q-values into one Q-table. Meanwhile, when the runtime environment changes, the Q-learning needs to retrain the task-unloaded decision model to obtain better adaptability. These factors result in a convergence time for Q learning that is too long. For example, when the number of tasks is 25, the convergence time of Q-learning exceeds 1000 seconds, which is much longer than DODQ. The above results demonstrate the advantage of DODQ in achieving low response time and high training efficiency.
TABLE 3.4 Convergence time of different methods at different task Scale
Figure BDA0003784072130000191
Figure BDA0003784072130000201
The above description is only a preferred embodiment of the present invention, and all the equivalent changes and modifications made according to the claims of the present invention should be covered by the present invention.

Claims (5)

1. A real-time dependent task unloading method based on deep reinforcement learning is characterized by comprising the following steps:
step S1: based on a system model of task unloading, training an unloading operation Q value prediction model by using a DQN algorithm in a runtime environment;
step S2: the unloading operation Q value prediction model predicts Q values of different unloading operations according to the computing capacity of the computing nodes, the transmission rate among the computing nodes and the applied unloading scheme, and then selects proper unloading operation by comparing the corresponding Q values;
and step S3: and repeating the step S2, and gradually determining the execution position for each task through feedback iteration.
2. The deep reinforcement learning-based real-time dependent task offloading method according to claim 1, wherein the task offloading system model includes a system model and a task model, and specifically includes:
the system model comprises a mobile device MD, an edge server ES and a cloud server CS, a set of computing nodes is represented by V = { MD, ES, CS }, and the computing capacity of each computing node is represented by f k (k ∈ V); data transfer rate between different computing nodes by v k,l (k, l ∈ V);
the task model specifically comprises: an application is represented by a directed acyclic graph G = (N, E), where N = {1, 2.., N } represents a set of subtasks, N is the number of subtasks, and the amount of computation per task is represented by c i (i ∈ N); e = { E = i,j I, j belongs to N, i < j represents a dependent directed edge set among subtasks, and for one e i,j E.g., E, and calls subtask i as the direct predecessor of subtask j, which is the direct predecessor of subtask iSubsequent tasks are carried out; in addition, each e i,j E.g. directed edge and weight d of E i,j Is associated with d i,j Represents the amount of data transferred from subtask i to subtask j; the direct predecessor task set and the direct successor task set of the subtask i are represented by pre (i) and suc (i), and a subtask can only start to execute after receiving the processing results of all its predecessor tasks.
3. The deep reinforcement learning-based real-time dependent task offloading method of claim 2, wherein a binary variable x is defined ik To indicate an offload scenario, if x ik =1 denotes the assignment of a subtask i to a computing node k, whereas x ik =0; since each subtask can only be assigned to one compute node in the network, there is the following definition:
Figure FDA0003784072120000021
in addition, any subtask j epsilon N can be executed only when two conditions are met; firstly, the distributed computing node is available, namely no other subtask is executed on the computing node currently; available time of node assigned by subtask j
Figure FDA0003784072120000022
The following constraints should be satisfied:
Figure FDA0003784072120000023
wherein
Figure FDA0003784072120000024
Is the completion time of subtask i;
second, the subtask j should be ready, i.e. it has received the processing results of all predecessor subtasks, task j ready time
Figure FDA0003784072120000025
Is defined as:
Figure FDA0003784072120000026
if a subtask j and its one predecessor subtask i ∈ P (j) are assigned to different compute nodes k and l, respectively, the communication delay needs to be taken into account
Figure FDA0003784072120000027
In this case, the second term to the right of the constraint will be zero;
considering the above two conditions together, the start time of the subtask j is defined as:
Figure FDA0003784072120000028
the end time of subtask j is defined as:
Figure FDA0003784072120000031
by D 1:t Representing all subtask sets successfully completed at the t-th time step;
cumulative execution delay T of application program at T time step 1:t Is defined as:
Figure FDA0003784072120000032
an application is considered complete if and only if all its n subtasks are successfully completed, when all n tasks of an application are successfully completed, D is then performed 1:n =1, 2, ·, n }; thus, the total execution delay T of the application 1:n Calculated by the following formula:
Figure FDA0003784072120000033
for an application with n tasks, the unloading scheme of the application is represented by DEP = (DEP (1), DEP (2),.., DEP (n)); the dep (i) belongs to the {1,2,3} represents the execution position of the task i belongs to the N, namely the dep is a terminal device, an edge server and a cloud server respectively;
the objective function is defined as:
Minimize T 1:n
4. the deep reinforcement learning-based real-time dependency task uninstalling method according to claim 1, wherein the DQN algorithm obtains a state s in a runtime cloud-edge environment, then selects an action a through an epsilon-greedy policy, and then receives an incentive value r and a next state s' obtained after the environment executes the action a; then the DQN algorithm stores the (s, a, r, s') obtained in each step into an experience storage pool; generally, the capacity of the empirical storage pool in the DQN algorithm is preset, and when a storage threshold is reached, the neural network parameters are updated, and the loss function of the neural network is as follows:
Loss=(r+γmaxQ(s′,a′;ω′)-Q(s,a;ω)) 2
wherein γ is a discount factor; q (s, a; ω) is the output of "EvalNet", which calculates the Q value of the current state action pair, ω being the DNN weight of "EvalNet"; maxQ (s ', a'; ω ') is the output of "TargetNet", which calculates the maximum Q value when the next state s' performs action a ', ω' being the DNN weight of "TargetNet".
5. The deep reinforcement learning-based real-time dependent task offloading method according to claim 1, wherein the step S2 specifically includes:
first, randomly initializing the weight ω of the EvalNet neural network and the weight ω' = ω of the TargetNet neural network (row 3);
for each training period, the current offload scheme DEP cur The current state s and the current response time T are initialized;
in the algorithm training process, the execution position of each subtask is sequentially determined through an epsilon-greedy strategy, one of all unloading schemes is randomly selected according to the probability of epsilon, and the unloading scheme with the maximum Q value in EvalNet is selected according to the probability of 1-epsilon;
next, performing action a and obtaining a new response time T ', calculating a reward r and updating the current response time T, observing the next state s';
then, putting (s, a, r, s') into an experience storage pool, randomly extracting m samples from the experience storage pool, and calculating a target value;
next, obtaining Loss according to a mean square error Loss function, updating the weight ω of the EvalNet neural network by using an Adam optimizer, and updating the weight ω' = ω of the TargetNet neural network after reaching the set C-round iteration;
and finally, updating the current state.
CN202210937248.5A 2022-08-05 2022-08-05 Real-time dependency task unloading method based on deep reinforcement learning Pending CN115220818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210937248.5A CN115220818A (en) 2022-08-05 2022-08-05 Real-time dependency task unloading method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210937248.5A CN115220818A (en) 2022-08-05 2022-08-05 Real-time dependency task unloading method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN115220818A true CN115220818A (en) 2022-10-21

Family

ID=83615053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210937248.5A Pending CN115220818A (en) 2022-08-05 2022-08-05 Real-time dependency task unloading method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115220818A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865564A (en) * 2022-11-17 2023-03-28 广州鲁邦通物联网科技股份有限公司 Edge computing gateway device supporting high-speed power line carrier communication

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865564A (en) * 2022-11-17 2023-03-28 广州鲁邦通物联网科技股份有限公司 Edge computing gateway device supporting high-speed power line carrier communication

Similar Documents

Publication Publication Date Title
CN108958916B (en) Workflow unloading optimization method under mobile edge environment
CN112882815B (en) Multi-user edge calculation optimization scheduling method based on deep reinforcement learning
Zhu et al. A deep-reinforcement-learning-based optimization approach for real-time scheduling in cloud manufacturing
CN111813506A (en) Resource sensing calculation migration method, device and medium based on particle swarm algorithm
CN113867843B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN113568727A (en) Mobile edge calculation task allocation method based on deep reinforcement learning
CN114661466A (en) Task unloading method for intelligent workflow application in edge computing environment
Badri et al. A sample average approximation-based parallel algorithm for application placement in edge computing systems
CN115220818A (en) Real-time dependency task unloading method based on deep reinforcement learning
CN114090108B (en) Method and device for executing computing task, electronic equipment and storage medium
CN112445617B (en) Load strategy selection method and system based on mobile edge calculation
CN117436485A (en) Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision
CN116954866A (en) Edge cloud task scheduling method and system based on deep reinforcement learning
CN117331693A (en) Task unloading method, device and equipment for edge cloud based on DAG
CN116828541A (en) Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning
CN110233763B (en) Virtual network embedding algorithm based on time sequence difference learning
CN114650321A (en) Task scheduling method for edge computing and edge computing terminal
CN114942799B (en) Workflow scheduling method based on reinforcement learning in cloud edge environment
CN111488208A (en) Edge cloud cooperative computing node scheduling optimization method based on variable step length bat algorithm
US11513866B1 (en) Method and system for managing resource utilization based on reinforcement learning
CN114138493A (en) Edge computing power resource scheduling method based on energy consumption perception
CN115150335A (en) Optimal flow segmentation method and system based on deep reinforcement learning
Li et al. Efficient data offloading using markovian decision on state reward action in edge computing
CN114302456A (en) Calculation unloading method for mobile edge calculation network considering task priority
CN116932164B (en) Multi-task scheduling method and system based on cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination