CN112905312A

CN112905312A - Workflow scheduling method based on deep Q neural network in edge computing environment

Info

Publication number: CN112905312A
Application number: CN202110074556.5A
Authority: CN
Inventors: 俞东进; 项媛媛; 黄彬彬
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-01-20
Filing date: 2021-01-20
Publication date: 2021-06-04

Abstract

The invention mainly discloses a workflow scheduling method SAWS based on deep reinforcement learning in an edge computing environment. The main process of the implementation of the invention is to construct the problem into a Markov decision process, define the reward, the state and the action corresponding to the workflow scheduling problem in the edge computing environment, then calculate the weight of the task nodes according to the execution duration, the transmission duration and the dependence condition of the task nodes in the workflow for sequencing, and then make a decision on the scheduling of the task nodes based on the deep Q network. The main objective of the SAWS strategy is to find a task scheduling strategy which can minimize the long-term execution delay of the workflow on the premise of ensuring the safety of user information. The invention greatly improves the execution efficiency of the workflow in the mobile edge network environment and ensures the information safety of the user through the learning and decision of the Q neural network.

Description

Workflow scheduling method based on deep Q neural network in edge computing environment

Technical Field

The invention mainly relates to the field of deep reinforcement learning and edge computing, in particular to a workflow scheduling method based on a deep Q neural network in an edge computing environment.

Background

Mobile Edge Computing (MEC) based mobile edge networks can provide low latency and high computational load for popular mobile applications (e.g., virtual/augmented reality, mobile games, in-vehicle network applications, etc.). In a mobile edge network, an edge cloud server with computing and storage functions is deployed near a mobile user, and for a mobile device, offloading service onto the edge server can provide the best quality of service for the mobile user, i.e., minimal response delay.

For mobile applications it may be defined as the execution of a series of tasks, and the order of execution of these tasks results from the dependency between the tasks on the result data. A workflow generated by a mobile device can be generally represented by a Directed Acyclic Graph (DAG), where a set of task nodes in the workflow correspond to a set of nodes in the graph, and relationships in which data dependencies exist between task nodes are represented as directed edges in the graph. In addition to the difficulties presented by scheduling workflow tasks, there are two difficulties: (1) moving edges compute the dynamic ignorance of the scene. (2) The information interaction between the user and the edge server has the possibility of data leakage and data tampering, and the user is lost.

Therefore, how to guarantee the service quality and information security of the mobile user workflow scheduling in the mobile edge computing environment is an important issue in the mobile edge computing research.

Disclosure of Invention

In order to solve the above problems, the present invention provides a workflow scheduling method based on a deep Q neural network in an edge computing environment.

The invention comprises the following steps:

s1, constructing an edge computing environment model:

mobile device is denoted by U, set eNB ═ eNB₁,…,eNB_i,…,eNB_nRepresents n edge servers;

computing power of mobile user is represented by C_uRepresenting each edge server to computing power by C_iRepresents;

the transmission rate between the mobile device U and the n edge servers is

Denotes eNB at the t-th time slice_iAnd U.

S2, generating a workflow:

setting the number of task nodes contained in the workflow generated by the mobile equipment as K, randomly arranging 1-K, and generating a corresponding directed acyclic graph G as a topological sorting result of the directed acyclic graph according to the arrangement sequence<V,E>As a workflow generated by the mobile device U; set of nodes in directed acyclic graph, V ═ V₁,…,v_k,…,v_KAs the set of task nodes in the workflow, the set of directed edges E ═ E in the directed acyclic graph_kl|v_k∈V,v_lE is V and is used as a set of dependency relationships among task nodes in the workflow;

directed edge e_klIndicating that only task v is being executed_kThe edge server of (2) will task v_kIs sent to the ready to execute task v_lOn the edge server, task v_lCan be executed; and there is only one start node v in the workflow_startAs the task node executed first in the workflow, there is also only one end node v_endAnd the time for completing the execution of the end node is the time for completing the execution of the workflow.

S3, task node priority ordering:

for each task node v_kAssigning a weight Pr (v)_k) To realize the sequencing of the task node priority; weight Pr (v)_k) The calculation is as follows:

wherein

Denotes v_kAverage time performed on all edge servers; r_cIs a fixed constant and represents the transmission rate of data between all edge servers, succ (v)_k) Representing task nodes v_kAll of the nodes of the predecessor are,

a data size representing a calculation result;

after the weights of the task nodes in all the workflows are obtained by calculating step by step from the end node to the precursor nodes, the weights are sorted in a descending order according to the weight, and the sorted weights are used as the execution order of the task nodes in the workflows.

S4, risk constraint:

task node v at mobile device_kThe probability of the task data being leaked or modified during the data transmission to the ith edge server is

The risk probability of the task data being attacked is

And its risk probability P (v)_k) Probability of risk P that must be less than or equal to the scene setting_max。

S5, constructing a Markov decision process model in the mobile edge environment, which comprises the following steps:

s51, defining the state of the system

Wherein W_c(τ)＝{W_c,1(τ),…,W_c,i(τ),,…,W_c,n(τ) } represents a set of task nodes offloaded to the corresponding edge server;

representing a transmission rate between the mobile device and the corresponding edge server;

s52, defining the action of the system

Wherein a is_c(τ)＝{a_c,1(τ),…,a_c,i(τ),,…,a_c,n(τ) } indicates that the task nodes in the workflow are unloaded to a certain edge server;

indicating the security level of the cryptographic service selected by the mobile device when scheduling a task node in the workflow,

representing the security level of the guaranteed data integrity service selected by the mobile device when scheduling the task node in the workflow;

s53, defining reward (tau) of the system as-T_end(v_k) Wherein T is_end(v_k) Representing a node v_kThe latest completion time of (c); latest time of completion

Comprises a task node v_kStart time T of_start(v_k) Time of encryption

Time of flight

Waiting time

Decryption time

And execution time

S6, building a depth Q network:

the deep Q neural network comprises an estimation Q neural network, a target neural network and an experience pool;

the Q neural network and the target Q neural network have the same network structure, and the Q neural network can periodically transmit network parameters to the target neural network;

the experience pool is used for storing state transition samples obtained by interaction with the environment in each time slice, and the quaternion group with fixed batch is randomly extracted from the experience pool to train the estimation Q neural network in each learning.

S7, algorithm implementation:

a constant value epsilon is given as learning time, and all task nodes in the workflow are considered to be completed by one-time learning after learning and scheduling; in the workflow scheduling process, setting the current time slice as 0; at the beginning of the tau-th time slice, selecting and executing an action a (tau) by observing the current state s (tau) of the moving edge environment, calculating a reward R (s (tau), a (tau)) after executing the action and observing the state s (tau' +1) of the system after executing the action, and storing the reward R (s (tau), a (tau)) and the state s into an experience pool; when enough data is stored in the experience pool, sampling learning is started.

The invention greatly improves the execution efficiency of the workflow in the mobile edge network environment and ensures the information safety of the user through the learning and decision of the Q neural network.

Drawings

FIG. 1 is a flowchart of a workflow task scheduling method based on deep reinforcement learning according to the present invention;

FIG. 2 is an architecture diagram of workflow scheduling in a mobile edge computing environment;

FIG. 3 is a diagram of a workflow scheduling strategy based on deep reinforcement learning security awareness;

FIG. 4 is a workflow convergence diagram with a task node number of 100;

FIG. 5 is a comparison of the inventive algorithm and the AWM algorithm for risk probability changes;

FIG. 6 is a comparison of the algorithm of the present invention and the AWM algorithm for varying server computing power;

fig. 7 is a comparison of the inventive algorithm and the AWM algorithm with a change in the number of servers.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

As shown in FIG. 1, the workflow scheduling method based on the deep Q neural network in the edge computing environment of the invention includes the following steps:

s1, constructing an edge computing environment model: in a practical scenario, a mobile device is served by a plurality of edge servers around it. A mobile device is denoted by U in the context of the present invention, by the set eNB ═ eNB₁,…,eNB_i,…,eNB_nRepresents n edge servers. In addition, only one group of workflows generated by the mobile equipment is learned and scheduled. The mobile device only needs to encrypt the task node data and then transmit the task data to the edge server through the wireless network, and the task data decryption and task execution work are carried out on the edge server. Computing power of mobile user is represented by C_uRepresenting each edge server to computing power by C_iAnd (4) showing. The transmission rate between the mobile device and the n edge servers is changed due to the mobility of the mobile device, so

Denotes eNB at the t-th time slice_iAnd U.

S2, generating a workflow: setting the number of task nodes contained in the workflow generated by the mobile equipment as K, randomly arranging 1-K, and arranging the task nodes according to the arrangementThe column order is used as the topological sorting result of the directed acyclic graph to generate a corresponding directed acyclic graph G ═ G-<V,E>As a workflow generated by the mobile device U, the set of nodes in the graph, V ═ V₁,…,v_k,…,v_KAs the collection of task nodes in the workflow, the directed edge collection E ═ E in the graph_kl|v_k∈V,v_lE.v as a set of dependencies between task nodes in the workflow. Any task node v in the workflow is different in the executed task content_kThe working load W thereof_kData size

And the data size of the calculation result

Or may be different from other task nodes in the workflow. Directed edge e_klIndicating that only task v is being executed_kThe edge server of (2) will task v_kIs sent to the ready to execute task v_lOn the edge server, task v_lCan it be executed. And there is only one start node v in the workflow_startAs the task node executed first in the workflow, there is also only one end node v_endAnd the time for completing the execution of the end node is the time for completing the execution of the workflow.

S3, task node priority ordering: in the invention, each task node v is divided into a plurality of task nodes_kAssigning a weight Pr (v)_k) To achieve the ordering of task node priorities. Weight Pr (v)_k) Can be calculated by

Thus obtaining the product. Since the task nodes have not been scheduled in the sequencing phase, the task node v_kIt is not known to which edge server it will be offloaded to execute, so v is taken into account in the weight calculation_kIn all edgesAverage time of execution on a server

R_cThe rate of data transmission between all edge servers is represented by a fixed constant value and calculated

The time at which the mobile device delivers the task data to the edge server is available. Since the execution sequence of the task nodes in the workflow has a dependency relationship, the task nodes v are calculated_kThe weight of (c) is taken into account for all its predecessor nodes succ (v)_k) The weight of (c). After the weights of the task nodes in all the workflows are obtained by calculating step by step from the end node to the precursor nodes, the weights are sorted in a descending order according to the weight, and the sorted weights are used as the execution order of the task nodes in the workflows.

S4, risk constraint: task node v at mobile device_kThe probability of the task data being leaked or modified during the data transmission to the ith edge server is

So the risk probability of the task data being attacked is

S5, constructing a Markov decision process model in the mobile edge environment, and specifically comprising the following steps:

s51, defining the state of the system

representing the transmission rate between the mobile device and the corresponding edge server.

S52, defining the action of the system

Wherein a is_c(τ)＝{a_c,1(τ),…,a_c,i(τ),,…,a_c,n(τ) } indicates to which edge server the task node in the workflow is offloaded.

indicating the security level selected by the mobile device to guarantee data integrity services when scheduling task nodes in the workflow.

S53, defining reward (tau) of the system as-T_end(v_k) Wherein T is_end(v_k) Representing a node v_kThe latest completion time of (c). And the latest completion time

Requiring the calculation of task nodes v first_kStart time T of_start(v_k) Time of encryption

Time of flight

Waiting time

Decryption time

And execution time

Wherein the start time

pre(v_k) Is v is_kOf predecessor nodes, T_end(v_h) Is any predecessor node v_hThe latest time of completion of the process,

is the edge server executing vh for any predecessor node will v_hIs transmitted to the executing node v which is currently scheduled_kBy the edge server, which can

And (4) calculating. Additionally encrypting time

Decryption time

Waiting time

The time for executing all task nodes in the server queue on the ith edge server is obtained

Time of flight

S6, building a depth Q network: the deep Q neural network is mainly constructed by three functional components. The method comprises the following steps: an estimated Q neural network, a target neural network, and an experience pool. The Q neural network has the same network structure as the target Q neural network, and the Q neural network can periodically transmit the network parameters to the target neural network. The experience pool is used for storing state transition samples obtained by interaction with the environment in each time slice, and the quaternion group with fixed batch is randomly extracted from the experience pool to train the estimation Q neural network in each learning.

S7, algorithm implementation: and giving a constant value epsilon as learning time, and finishing learning and scheduling all task nodes in the workflow to be regarded as finishing learning once. In the workflow scheduling process, the current time slice is set to 0. At the beginning of the τ -th time slice, action a (τ) is selected and executed by observing the current state s (τ) of the moving edge environment, the reward R (s (τ), a (τ)) after the action is executed and the state s (τ' +1) of the system after the action is executed are calculated and stored in the experience pool. When enough data is stored in the experience pool, sampling learning is started.

The invention also realizes an AWM baseline algorithm, and the task nodes in the workflow are scheduled to the edge server with the minimum load for execution, compared with the SAWS strategy of the invention. And the influence of the risk probability change, the server computing capacity change and the server quantity change on the workflow execution time is evaluated respectively. Also from a comparison of the above scenarios, it can be observed that the SAWS strategy is superior to the AWM strategy.

Taking the workflow with the task node number of 100 as an example, the learning situation of the SAWS policy is shown in fig. 4, and it can be observed that the more the learning times are, the shorter the execution time of the workflow is and the later stage of learning gradually becomes stable, which shows that the execution delay of the workflow in the mobile edge environment can be effectively reduced by the present invention.

Risk probability: sequentially executing the risk probability P of the tasks in the workflow_maxAre respectively set to 0.2, 0.4, 0.6 and 0.81.0 the experiment was carried out. The workflow execution of the two strategies at different risk probabilities is shown in fig. 5. It can be observed from the figure that the higher the risk probability, the shorter the time to complete the workflow.

Edge server computing power: experiments were performed with the computing power of all edge servers set to 15GHz/s, 17.5GHz/s, 20GHz/s, 22.5GHz/s, 25GHz/s, respectively. Workflow execution for both policies when the edge servers are not of equal computing power is shown in fig. 6. It can be observed from the figure that the more computing power at the edge server, the shorter the time to complete the workflow. This is mainly because the stronger the computing power of the edge server, the shorter the time to decrypt and execute a task, and the shorter the waiting time for a task when other tasks are offloaded to the edge server.

Number of edge servers: experiments were performed with the number of edge servers in the environment changed in sequence to 2, 4, 6, 8, 10. Fig. 7 shows the workflow execution of two strategies when the number of edge servers is different. It can be observed that as the number of edge servers increases, the time to complete the workflow is shorter. The main reason is that the number of edge servers in the environment is increased, the number of task nodes unloaded to each edge server is correspondingly reduced, and the waiting time for task execution is reduced.

Claims

1. The workflow scheduling method based on the deep Q neural network in the edge computing environment is characterized by comprising the following steps:

s1, constructing an edge computing environment model:

mobile device is denoted by U, set eNB ═ eNB₁，...，eNB_i，...，eNB_nRepresents n edge servers;

the transmission rate between the mobile device U and the n edge servers is

Denotes eNB at the t-th time slice_iAnd the transmission rate between U;

s2, generating a workflow:

setting the number of task nodes contained in the workflow generated by the mobile equipment as K, randomly arranging 1-K, and generating a corresponding directed acyclic graph G as a topological sorting result of the directed acyclic graph according to the arrangement sequence<V，E>As a workflow generated by the mobile device U; set of nodes in directed acyclic graph, V ═ V₁，...，v_k，...，v_KAs the set of task nodes in the workflow, the set of directed edges E ═ E in the directed acyclic graph_kl|v_k∈V，v_lE is V and is used as a set of dependency relationships among task nodes in the workflow;

directed edge e_klIndicating that only task v is being executed_kThe edge server of (2) will task v_kIs sent to the ready to execute task v_lOn the edge server, task v_lCan be executed; and there is only one start node v in the workflow_startAs the task node executed first in the workflow, there is also only one end node v_endAnd the time for finishing the execution of the node is the time for finishing the execution of the workflow;

s3, task node priority ordering:

wherein

Denotes v_kAverage time performed on all edge servers; r_cIs a fixed constant and represents the mutual transmission between all edge serversRate of data, succ (v)_k) Representing task nodes v_kAll of the nodes of the predecessor are,

a data size representing a calculation result;

after the weights of the task nodes in all the workflows are obtained through step-by-step calculation from the end node to the precursor nodes, the weights are sorted in a descending order according to the weight, and the sorted weights are used as the execution order of the task nodes in the workflows;

s4, risk constraint:

type∈{cf，ig}；

The risk probability of the task data being attacked is

And its risk probability P (v)_k) Probability of risk P that must be less than or equal to the scene setting_max；

s51, defining the state of the system

Wherein W_c(τ)＝{W_c，1(τ)，...，W_c，i(τ)，，...，W_c，n(τ) } represents a set of task nodes offloaded to the corresponding edge server;

s52, defining the action of the system

Wherein a is_c(τ)＝{a_c，1(τ)，...，a_c，i(τ)，，...，a_c，n(τ) } indicates that the task nodes in the workflow are unloaded to a certain edge server;

s53. defining reward R (tau) ═ T of system_end(v_k) Wherein T is_end(v_k) Representing a node v_kThe latest completion time of (c); latest time of completion

Comprises a task node v_kStart time T of_start(v_k) Time of encryption

Time of flight

Waiting time

Decryption time

And execution time

S6, building a depth Q network:

the experience pool is used for storing a state transition sample obtained by interaction with the environment in each time slice, and a fixed batch of quaternion groups are randomly extracted from the experience pool to train the estimation Q neural network in each learning;

s7, algorithm implementation:

2. The method for workflow scheduling based on deep Q neural network in edge computing environment according to claim 1, wherein: setting the computing power of the mobile device in S1 to 10GHz/S, and then providing 5 edge servers with computing powers of 15GHz/S, 17.5GHz/S, 20GHz/S, 22.5GHz/S and 25GHz/S respectively.

3. The method for workflow scheduling based on deep Q neural network in edge computing environment according to claim 1, wherein: setting the number of the workflow nodes in the S2 as 100, setting the sum of the access degrees of other task nodes except the start node and the end node to be less than 5, and setting the probability of generating the dependency relationship between the two task nodes as 10%; the workload of each task node is subject to uniform distribution on 1-10 GHz.s, and the data size is subject to uniform distribution on 10-100 MB.

4. The method for workflow scheduling based on deep Q neural network in edge computing environment according to claim 1, wherein: s4, setting the risk coefficients of task data leakage in the process of transmitting task data to the edge server by the mobile device to be respectively

And

the risk coefficients of the modified task data are respectively set as

And

5. the method for workflow scheduling based on deep Q neural network in edge computing environment according to claim 1, wherein: start time in S53

is to execute any predecessor nodeV is_hV is sent to the edge server_hIs transmitted to the executing node v which is currently scheduled_kThe time of the edge server of (1);

encryption time

Time of flight

Decryption time

Execution time

Waiting time

And the time for the ith edge server to finish executing all task nodes in the server queue.