CN113590229A

CN113590229A - Industrial Internet of things graph task unloading method and system based on deep reinforcement learning

Info

Publication number: CN113590229A
Application number: CN202110923267.8A
Authority: CN
Inventors: 韩瑜; 李锦铭; 古博; 秦臻; 张旭; 姜善成; 唐兆家
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-11-02
Anticipated expiration: 2041-08-12
Also published as: CN113590229B

Abstract

The invention discloses an industrial Internet of things graph task unloading method and system based on deep reinforcement learning, wherein the method comprises the following steps: constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things; setting an optimization target for graph task unloading based on a mobile edge computing system, wherein the optimization target is the sum of the minimized task completion time and the weight consumed by data exchange; according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, reinforcement learning is carried out based on a pre-constructed deep Q network, and the computation-intensive graph task is unloaded in a dynamic time-varying environment, so that the optimal action is obtained. The system comprises: the system comprises a system model building module, an optimization target setting module and a reinforcement learning module. By using the method and the device, the unloading strategy of the graphic task under the time-varying condition and limited resources can be formulated. The invention can be widely applied to the field of industrial Internet of things.

Description

Industrial Internet of things graph task unloading method and system based on deep reinforcement learning

Technical Field

The invention relates to the field of industrial Internet of things, in particular to an industrial Internet of things graph task unloading method and system based on deep reinforcement learning.

Background

Under the big background of intelligent manufacturing in China, all the links of a factory are gradually developing towards intellectualization, and an unmanned forklift is widely applied to various factories as a main implementation mode of logistics in an intelligent factory. They can transport the material automatically and high-efficiently in the mill, have solved the too high problem of artifical transport intensity of labour. Unmanned Forklifts (UFs) have computational processors and various sensing devices (e.g., unmanned aerial vehicle cameras and high-quality sensors), and can carry perception-related applications (e.g., personnel identification, obstacle identification, anomaly identification and early warning, etc.) with innovative and computationally intensive features.

Some aware applications may involve a large number of complex tasks, and the processing of some complex tasks locally at the UF is impractical due to limitations in individual UF computing power and power consumption, thus requiring offloading of tasks to nearby equipment or base station processing. This complicates task offloading and makes reasonable task offloading challenging, as complex tasks are typically composed of interdependent sub-tasks. Graph tasks are used to represent dependencies between various compute-intensive tasks, where the tasks and data streams are represented by graph vertices and edges, respectively, so offloading of tasks from a task graph is an efficient method.

For an industrial internet of things with complex communication conditions, due to frequent changes of wireless channel conditions and fluctuation of computing resources of each device, the existing graphic task unloading method cannot be well executed. In highly dynamic environments, the optimization problem must be solved frequently, which may result in a waste of computing resources.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide an industrial internet of things graph task unloading method and system based on deep reinforcement learning, and an unloading strategy of a graph task under a time-varying condition and limited resources is formulated by considering the complexity of a communication environment and minimizing the sum of task completion time and Weight (WETC) consumed by data exchange.

The first technical scheme adopted by the invention is as follows: an industrial Internet of things graph task unloading method based on deep reinforcement learning comprises the following steps:

s1, constructing a mobile edge computing system based on unloading task scenes in the industrial Internet of things;

s2, setting an optimization target for graph task unloading based on the mobile edge computing system, wherein the optimization target is the sum of the weight of minimized task completion time and data exchange consumption;

s3, according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, performing reinforcement learning based on a pre-constructed deep Q network, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action;

the pre-constructed deep Q network comprises a train Q-network and a target Q-network which have the same structure.

Further, the step of constructing the mobile edge computing system based on the task unloading scene of one task initiator and a plurality of task performers in the industrial internet of things specifically includes:

s11, constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things, wherein the unloading task scene comprises a task initiator and a plurality of task executors;

s12, representing the dependency relationship between tasks by using an undirected acyclic graph G ═ { V, E }, which includes a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E ∈ W }_ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in G_ijVariable indication v used as binary indicator_iAnd v_jWhether there is data exchange between them;

graph task offloading is performed in a mobile edge computing system, and the graph task offloading has transmission time consumption, execution time consumption and data exchange consumption.

Further, the reinforcement learning of the pre-constructed deep Q network is as follows:

state space, the state at time t being represented as

Wherein

Representing the channel gain of the task performer i at time t, f_t＝{f_t,iI belongs to m represents the cpu frequency of the task performer i at the time t, u_t＝{u_t,iI belongs to m represents the idle gaps quantity of the task performer i at the time t, G_tTopological relations representing a task graph, d_t＝{d_t,iI belongs to m represents the distance between the task executor i and the task initiator at the time t;

action space, current task v_iIs represented by a_i{a_i,1,a_i,2,...a_i,mI ∈ m }, where a_i,jIs set to a binary indicator, i.e. a_i,jDenoted 1 as selection device n_jTo unload task v_i，a_i,j0 denotes no selected device n_jTo unload task v_i；

Reward function, reward setting of the system

Where t (u) represents time consumption, e (b) represents data exchange consumption, and α and (1- α) represent weights of time consumption and data exchange consumption, respectively.

Further, the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in the mobile edge computing system and an optimization goal of graph task offloading, and learning to offload a computation-intensive graph task in a dynamic time-varying environment to obtain an optimal action specifically includes:

s31, calculating the time consumption and the data exchange consumption in the dynamic environment according to the environment state in the mobile edge computing system and the optimization target of graph task unloading;

s32, determining a return r corresponding to the action a, inputting a new observed environment state S' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer mode;

s33, repeating the step S32 until the return r is judged to be converged and approaches to the maximization, and taking the current action as the optimal action.

Further, the step of determining the return r corresponding to the action a, inputting a new observed environment state s' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer manner specifically includes:

initializing graph task G ═ { V, E }, empirical playback pool, parameter θ of train-Q network^trainParameter theta of target-Q network^tragetAnd a system environment, and setting an empty queue Q;

random selection of a task v by a task initiator_iTaking the queue as a head node of the queue and carrying out enqueue operation;

the task initiator sequentially takes out the tasks v to be unloaded from the queue_iAnd according to the edge set E in the graph task G, the graph task V and the task V are sequentially combined_iAll the associated tasks which are not unloaded carry out enqueue operation;

inputting the state of the current observed environment of the task initiator into a target Q-network, and outputting Q { s, a | theta }_a∈ASelecting action a according to an epsilon-greedy algorithm_iThen, unloading tasks are carried out;

task initiator observes state s from environment_i+1And awarding the reward i and will(s)_i,a_i,r_i,s_i+1) Accessing as experience values into an experience playback pool;

judging that the experience playback pool is full of data, and randomly selecting K experience values from the experience playback pool;

calculating target values and incorporating experienceValue, updating parameter θ of train-Q network based on gradient descent method^train；

After F time steps, according to the parameter theta of the train Q-network at the current moment^trainUpdating parameter theta of target Q-network^target。

The second technical scheme adopted by the invention is as follows: an industrial internet of things graph task unloading system based on deep reinforcement learning comprises:

the system model building module is used for building a mobile edge computing system based on task unloading scenes of one task initiator and a plurality of task performers in the industrial Internet of things;

the optimization target setting module is used for setting an optimization target for task unloading of the graph based on the mobile edge computing system, wherein the optimization target is the sum of the minimum task completion time and the weight consumed by data exchange;

and the reinforcement learning module is used for carrying out reinforcement learning based on a pre-constructed deep Q network according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action.

The method and the system have the beneficial effects that: the invention considers the complexity of the communication environment, minimizes the sum of WETC, makes the unloading strategy of the graphic tasks under the time-varying condition and limited resources, continuously accumulates experience in multiple iterations through a deep Q network, and learns through feedback signals, thereby being capable of making a near-optimal strategy to reasonably unload the complex graphic tasks in a dynamic environment.

Drawings

FIG. 1 is a flowchart of steps of an industrial IOT graph task unloading method based on deep reinforcement learning according to the invention;

FIG. 2 is a structural block diagram of an industrial Internet of things graph task unloading system based on deep reinforcement learning;

FIG. 3 is a schematic diagram of an industrial IOT scenario in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of the task framework in accordance with an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

Referring to fig. 1, the invention provides an industrial internet of things graph task unloading method based on deep reinforcement learning, which comprises the following steps:

specifically, referring to fig. 3, due to the limitations of single UF computing power and power consumption, the device needs to offload locally generated compute-intensive tasks to nearby devices and servers for processing through wireless communication, thereby reducing the capacity requirement and power consumption of a single device, and therefore, we construct a Mobile Edge Computing (MEC) system with a scenario in which a task initiator and multiple task performers offload tasks. Herein, each MEC system provides an information technology service environment and cloud computing capability at the edge of a mobile network as a unit for executing graphics tasks in parallel, making it possible to process tasks quickly, and assuming no interference between each MEC system.

Assuming that there is one task initiator and M task performers in the MEC system, where N ═ { Ni | i ∈ M } denotes the number of task performers, the computational resources of each task performer are divided into different numbers of free slots, which may be expressed as z ═ { z ∈ M-_iI ∈ M }. We consider that each free gap can provide computational services for each of the graph tasks.

Specifically, the Deep Q Network (DQN) is composed of two neural networks of the same structure but different roles, one is a train Q-network and one is a train Q-network. The two neural networks have different parameters theta^trainAnd theta^target。θ^trainQ value (expected reward), θ, for evaluating optimal actions^targetAn act of selecting the corresponding maximum Q value (by an epsilon-greedy algorithm). The two sets of parameters separate action selection and strategy evaluation, and reduce the risk of overfitting in the process of estimating the Q value. The experience pool is used for storing experiences generated by all the agents, and the experiences obtained by random sampling from the experience pool are used as the input of the train Q-network to update the parameters of the train Q-network, so that the memory and the computing resources required by training can be greatly reduced, and the coupling between data is reduced.

Further, as a preferred embodiment of the method, the step of constructing the mobile edge computing system based on the task unloading scene of one task initiator and a plurality of task performers in the industrial internet of things specifically includes:

specifically, to reflect the different topological relationships of graph tasks conveniently, we represent the dependencies between tasks as undirected acyclic graphs G ═ V, E, where the package isContaining a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E { (E) }_ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in G_ijVariable indication v used as binary indicator_iAnd v_jWhether there is data exchange between them. In addition, the parameter L ═ { L ═ L_iI belongs to W and represents the data size of different tasks, and the calculation workload is determined by a parameter K ═ K ∈ W_iI ∈ W } where K is_iIndicating the execution of a task v_iThe amount of CPU cycles required. The graph task framework refers to FIG. 4.

In addition, we assume that each task performer is allocated a dedicated spectrum resource block during the transmission to support concurrent transmission of task offloading and downloading. Considering that the uplink transmission time is much longer than the downlink transmission time, only the consumption of uplink transmission is discussed herein, and given that task initiators are long-term suffering from resource shortages, it is assumed that the computationally intensive tasks would tend to be entirely offloaded to task performers.

Consumption of transmission time

Indicating that the task initiator offloads the task i to the edge device j in parallel, and the value of the task i is related to the conditions of channel condition, transmission power, bandwidth and the like. We use P_TIIndicating a fixed power of uplink transmission from the task initiator,

indicating the channel gain for offloading task i to device j. Moreover, we assume that there is additive white gaussian noise with zero mean and equal variance σ 2 at the receiving end of all tasks. According to shannon's theorem, the uplink transmission rate of offloading task i from the task initiator to device j:

where B denotes a fixed bandwidth of an orthogonal channel allocated to an edge device. Therefore, the uplink transmission time of task i is:

we indicate the variable a in binary_ijIndicating whether task i is offloaded to device j

Thus, the total time consumed for task transmission is:

execution time consumption: after the task is transferred to each idle slot, the edge device starts to execute each subtask in parallel. Let us use f ═ f_iI is less than or equal to M is expressed as CPU frequency used by executing task on edge device, and the same f is adopted by idle interval for bearing calculation task_i. Thus, the total time consumed for task execution is:

therefore, the total time consumed for task transmission and execution is:

T(a)＝T^trans(a)+T^exec(a)

data exchange cost:

we assume that when a subtask with a join relationship is offloaded to a different task performer, it will incur a data exchange cost of c_jj′(j ∈ m, j' ∈ m, i ≠ j), if the data is unloaded to the same task executor, the data exchange cost is not generated, which represents the cost generated by the traffic exchange between different task executors in the MEC system. We use binaryIndicating variable b_jj′To indicate whether there is a data exchange between different task performers, then

Thus, the total data exchange consumption is:

modeling an optimization target: to obtain the offloading strategy for computationally intensive graph tasks in the considered MEC, we formulated the following optimization problem with the main performance of minimizing the sum of the weights of task completion time and data exchange consumption (WETC).

γ＝αT(u)+(1-α)E(b)

Thus, the following optimization model was constructed:

s.t.(a)

(b)

(c)

(d)

(e)

(f)

the following six constraints are simultaneously satisfied:

(1)

representing a task v_iWhether or not to assign to device n_j；

(2)

Indicating that all tasks in the task graph need to be distributed to the idle gaps of the relevant task performers for performing;

(3)

ensuring that tasks assigned to the same task performer cannot exceed maximum computational resources

(4)

The CPU frequency of each task executor is limited to a certain interval;

(5)

the method comprises the following steps of representing that the distance between a task initiator and a task performer is limited;

(6)

the computational resources representing each task performer fluctuate randomly over an interval.

Further as a preferred embodiment of the method, an algorithm based on a combination of a Deep Q Network (DQN) and breadth-first traversal (BFS) is adopted to learn to offload computation-intensive graph tasks in a dynamic time-varying environment. The reinforcement learning of the pre-constructed deep Q network is as follows:

state space, as we mentionIn the outgoing DRL framework, agent will monitor the environment and record the system status at time intervals. At time t, we represent the state at this time as

Wherein

action space, when obtaining environment feedback state, the agent will according to the observed situation and the current task v_iAnd selecting the most suitable unloading strategy according to the output of the Q network. Thus, the current task v_iCan be represented as a_i{a_i,1,a_i,2,...a_i,mI ∈ m }, where a_i,jIs set to a binary indicator, i.e. a_i,jDenoted 1 as selection device n_jTo unload task v_i，a_i,j0 denotes no selected device n_jTo unload task v_i；

Reward function, the optimization objective of the above considerations is to produce minimization of WETC with graph task offload in large scale scenarios facing random transformations, so we set the reward of the system to be

Where t (u) represents time consumption, e (b) represents data exchange consumption, and α and (1- α) represent weights of time consumption and data exchange consumption, respectively. The aim of minimum WETC is achieved by maximizing r, and finally the task initiator finds the optimal unloading scheme.

Further, as a preferred embodiment of the method, the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in the mobile edge computing system and an optimization goal of graph task offloading, and learning to offload a computation-intensive graph task in a dynamic time-varying environment to obtain an optimal action specifically includes:

Further, as a preferred embodiment of the method, the step of determining the return r corresponding to the action a, inputting a new observed environment state s' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and further updating parameters of the train Q-network in a reverse gradient transfer manner specifically includes:

initializing graph task G ═ { V, E }, empirical playback pool, parameter θ of train-Q network^trainParameter theta of target-Q network^targetAnd a system environment, and setting an empty queue Q;

inputting the state of the current observed environment of the task initiator into a target Q-network, and outputting Q { s, a | theta }_a∈ASelecting action a according to an epsilon-greedy algorithm_iAnd then unloadedA task;

specifically, the epsilon-greedy algorithm:

task initiator observes state s from environment_i+1And reward return r_iAnd will(s)_i,a_i,r_i,s_i+1) Accessing as experience values into an experience playback pool;

calculating a target value and combining an empirical value, and updating a parameter theta of the train-Q network based on a gradient descent method^train；

As shown in fig. 2, an industrial internet of things graph task unloading system based on deep reinforcement learning includes:

The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An industrial Internet of things graph task unloading method based on deep reinforcement learning is characterized by comprising the following steps:

2. The deep reinforcement learning-based industrial internet of things graph task offloading method according to claim 1, wherein the step of constructing the mobile edge computing system based on offloading task scenes in the industrial internet of things specifically includes:

3. The deep reinforcement learning-based industrial internet of things graph task unloading method according to claim 2, wherein the specific steps of reinforcement learning based on the pre-constructed deep Q network comprise:

state space, the state at time t being represented as

Wherein

Representing the channel gain of the task performer i at time t, f_t＝{f_t，iI belongs to m represents the cpu frequency of the task performer i at the time t, u_t＝{u_t，iI belongs to m represents the number of free gaps of the task performer i at the moment t, G_tRepresenting topological relations of the task graph, d_t＝{d_t，iI belongs to m represents the distance between the task executor i and the task initiator at the time t;

action space, current task v_iIs represented by a_i{a_i，1，a_i，2，…a_i，mI ∈ m }, where a_i，jSet to a binary indicator;

reward function, reward setting of the system

4. The deep reinforcement learning-based industrial internet of things task offloading method according to claim 3, wherein the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in a mobile edge computing system and an optimization goal of graph task offloading, learning to offload computation-intensive graph tasks in a dynamic time-varying environment, and obtaining an optimal action specifically comprises:

5. The deep reinforcement learning-based industrial internet of things graph task offloading method according to claim 4, wherein the step of determining the reward r corresponding to the action a, inputting a new observed environment state s' into a pre-constructed deep Q network, calculating a loss function loss by using the reward r, and further updating parameters of a train Q-network in a reverse gradient transfer manner specifically comprises:

inputting the state of the current observed environment of the task initiator into a target Q-network, outputting Q s,a|θ}_a∈Aselecting action a according to an epsilon-greedy algorithm_iThen, unloading tasks are carried out;

task initiator observes state s from environment_i+1And reward return r_iAnd will(s)_i，a_i，r_i，s_i+1) Accessing as experience values into an experience playback pool;

6. The utility model provides an industry thing networking map task uninstallation system based on deep reinforcement learning which characterized in that includes:

the system model building module is used for building a mobile edge computing system based on unloading task scenes in the industrial Internet of things;