CN113590229A - Industrial Internet of things graph task unloading method and system based on deep reinforcement learning - Google Patents

Industrial Internet of things graph task unloading method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN113590229A
CN113590229A CN202110923267.8A CN202110923267A CN113590229A CN 113590229 A CN113590229 A CN 113590229A CN 202110923267 A CN202110923267 A CN 202110923267A CN 113590229 A CN113590229 A CN 113590229A
Authority
CN
China
Prior art keywords
task
graph
network
unloading
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110923267.8A
Other languages
Chinese (zh)
Other versions
CN113590229B (en
Inventor
韩瑜
李锦铭
古博
秦臻
张旭
姜善成
唐兆家
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110923267.8A priority Critical patent/CN113590229B/en
Publication of CN113590229A publication Critical patent/CN113590229A/en
Application granted granted Critical
Publication of CN113590229B publication Critical patent/CN113590229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44594Unloading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing

Abstract

The invention discloses an industrial Internet of things graph task unloading method and system based on deep reinforcement learning, wherein the method comprises the following steps: constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things; setting an optimization target for graph task unloading based on a mobile edge computing system, wherein the optimization target is the sum of the minimized task completion time and the weight consumed by data exchange; according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, reinforcement learning is carried out based on a pre-constructed deep Q network, and the computation-intensive graph task is unloaded in a dynamic time-varying environment, so that the optimal action is obtained. The system comprises: the system comprises a system model building module, an optimization target setting module and a reinforcement learning module. By using the method and the device, the unloading strategy of the graphic task under the time-varying condition and limited resources can be formulated. The invention can be widely applied to the field of industrial Internet of things.

Description

Industrial Internet of things graph task unloading method and system based on deep reinforcement learning
Technical Field
The invention relates to the field of industrial Internet of things, in particular to an industrial Internet of things graph task unloading method and system based on deep reinforcement learning.
Background
Under the big background of intelligent manufacturing in China, all the links of a factory are gradually developing towards intellectualization, and an unmanned forklift is widely applied to various factories as a main implementation mode of logistics in an intelligent factory. They can transport the material automatically and high-efficiently in the mill, have solved the too high problem of artifical transport intensity of labour. Unmanned Forklifts (UFs) have computational processors and various sensing devices (e.g., unmanned aerial vehicle cameras and high-quality sensors), and can carry perception-related applications (e.g., personnel identification, obstacle identification, anomaly identification and early warning, etc.) with innovative and computationally intensive features.
Some aware applications may involve a large number of complex tasks, and the processing of some complex tasks locally at the UF is impractical due to limitations in individual UF computing power and power consumption, thus requiring offloading of tasks to nearby equipment or base station processing. This complicates task offloading and makes reasonable task offloading challenging, as complex tasks are typically composed of interdependent sub-tasks. Graph tasks are used to represent dependencies between various compute-intensive tasks, where the tasks and data streams are represented by graph vertices and edges, respectively, so offloading of tasks from a task graph is an efficient method.
For an industrial internet of things with complex communication conditions, due to frequent changes of wireless channel conditions and fluctuation of computing resources of each device, the existing graphic task unloading method cannot be well executed. In highly dynamic environments, the optimization problem must be solved frequently, which may result in a waste of computing resources.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide an industrial internet of things graph task unloading method and system based on deep reinforcement learning, and an unloading strategy of a graph task under a time-varying condition and limited resources is formulated by considering the complexity of a communication environment and minimizing the sum of task completion time and Weight (WETC) consumed by data exchange.
The first technical scheme adopted by the invention is as follows: an industrial Internet of things graph task unloading method based on deep reinforcement learning comprises the following steps:
s1, constructing a mobile edge computing system based on unloading task scenes in the industrial Internet of things;
s2, setting an optimization target for graph task unloading based on the mobile edge computing system, wherein the optimization target is the sum of the weight of minimized task completion time and data exchange consumption;
s3, according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, performing reinforcement learning based on a pre-constructed deep Q network, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action;
the pre-constructed deep Q network comprises a train Q-network and a target Q-network which have the same structure.
Further, the step of constructing the mobile edge computing system based on the task unloading scene of one task initiator and a plurality of task performers in the industrial internet of things specifically includes:
s11, constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things, wherein the unloading task scene comprises a task initiator and a plurality of task executors;
s12, representing the dependency relationship between tasks by using an undirected acyclic graph G ═ { V, E }, which includes a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E ∈ W }ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in GijVariable indication v used as binary indicatoriAnd vjWhether there is data exchange between them;
graph task offloading is performed in a mobile edge computing system, and the graph task offloading has transmission time consumption, execution time consumption and data exchange consumption.
Further, the reinforcement learning of the pre-constructed deep Q network is as follows:
state space, the state at time t being represented as
Figure BDA0003208226570000021
Wherein
Figure BDA0003208226570000022
Representing the channel gain of the task performer i at time t, ft={ft,iI belongs to m represents the cpu frequency of the task performer i at the time t, ut={ut,iI belongs to m represents the idle gaps quantity of the task performer i at the time t, GtTopological relations representing a task graph, dt={dt,iI belongs to m represents the distance between the task executor i and the task initiator at the time t;
action space, current task viIs represented by ai{ai,1,ai,2,...ai,mI ∈ m }, where ai,jIs set to a binary indicator, i.e. ai,jDenoted 1 as selection device njTo unload task vi,ai,j0 denotes no selected device njTo unload task vi
Reward function, reward setting of the system
Figure BDA0003208226570000023
Where t (u) represents time consumption, e (b) represents data exchange consumption, and α and (1- α) represent weights of time consumption and data exchange consumption, respectively.
Further, the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in the mobile edge computing system and an optimization goal of graph task offloading, and learning to offload a computation-intensive graph task in a dynamic time-varying environment to obtain an optimal action specifically includes:
s31, calculating the time consumption and the data exchange consumption in the dynamic environment according to the environment state in the mobile edge computing system and the optimization target of graph task unloading;
s32, determining a return r corresponding to the action a, inputting a new observed environment state S' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer mode;
s33, repeating the step S32 until the return r is judged to be converged and approaches to the maximization, and taking the current action as the optimal action.
Further, the step of determining the return r corresponding to the action a, inputting a new observed environment state s' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer manner specifically includes:
initializing graph task G ═ { V, E }, empirical playback pool, parameter θ of train-Q networktrainParameter theta of target-Q networktragetAnd a system environment, and setting an empty queue Q;
random selection of a task v by a task initiatoriTaking the queue as a head node of the queue and carrying out enqueue operation;
the task initiator sequentially takes out the tasks v to be unloaded from the queueiAnd according to the edge set E in the graph task G, the graph task V and the task V are sequentially combinediAll the associated tasks which are not unloaded carry out enqueue operation;
inputting the state of the current observed environment of the task initiator into a target Q-network, and outputting Q { s, a | theta }a∈ASelecting action a according to an epsilon-greedy algorithmiThen, unloading tasks are carried out;
task initiator observes state s from environmenti+1And awarding the reward i and will(s)i,ai,ri,si+1) Accessing as experience values into an experience playback pool;
judging that the experience playback pool is full of data, and randomly selecting K experience values from the experience playback pool;
calculating target values and incorporating experienceValue, updating parameter θ of train-Q network based on gradient descent methodtrain
After F time steps, according to the parameter theta of the train Q-network at the current momenttrainUpdating parameter theta of target Q-networktarget
The second technical scheme adopted by the invention is as follows: an industrial internet of things graph task unloading system based on deep reinforcement learning comprises:
the system model building module is used for building a mobile edge computing system based on task unloading scenes of one task initiator and a plurality of task performers in the industrial Internet of things;
the optimization target setting module is used for setting an optimization target for task unloading of the graph based on the mobile edge computing system, wherein the optimization target is the sum of the minimum task completion time and the weight consumed by data exchange;
and the reinforcement learning module is used for carrying out reinforcement learning based on a pre-constructed deep Q network according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action.
The method and the system have the beneficial effects that: the invention considers the complexity of the communication environment, minimizes the sum of WETC, makes the unloading strategy of the graphic tasks under the time-varying condition and limited resources, continuously accumulates experience in multiple iterations through a deep Q network, and learns through feedback signals, thereby being capable of making a near-optimal strategy to reasonably unload the complex graphic tasks in a dynamic environment.
Drawings
FIG. 1 is a flowchart of steps of an industrial IOT graph task unloading method based on deep reinforcement learning according to the invention;
FIG. 2 is a structural block diagram of an industrial Internet of things graph task unloading system based on deep reinforcement learning;
FIG. 3 is a schematic diagram of an industrial IOT scenario in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of the task framework in accordance with an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
Referring to fig. 1, the invention provides an industrial internet of things graph task unloading method based on deep reinforcement learning, which comprises the following steps:
s1, constructing a mobile edge computing system based on unloading task scenes in the industrial Internet of things;
specifically, referring to fig. 3, due to the limitations of single UF computing power and power consumption, the device needs to offload locally generated compute-intensive tasks to nearby devices and servers for processing through wireless communication, thereby reducing the capacity requirement and power consumption of a single device, and therefore, we construct a Mobile Edge Computing (MEC) system with a scenario in which a task initiator and multiple task performers offload tasks. Herein, each MEC system provides an information technology service environment and cloud computing capability at the edge of a mobile network as a unit for executing graphics tasks in parallel, making it possible to process tasks quickly, and assuming no interference between each MEC system.
Assuming that there is one task initiator and M task performers in the MEC system, where N ═ { Ni | i ∈ M } denotes the number of task performers, the computational resources of each task performer are divided into different numbers of free slots, which may be expressed as z ═ { z ∈ M-iI ∈ M }. We consider that each free gap can provide computational services for each of the graph tasks.
S2, setting an optimization target for graph task unloading based on the mobile edge computing system, wherein the optimization target is the sum of the weight of minimized task completion time and data exchange consumption;
s3, according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, performing reinforcement learning based on a pre-constructed deep Q network, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action;
the pre-constructed deep Q network comprises a train Q-network and a target Q-network which have the same structure.
Specifically, the Deep Q Network (DQN) is composed of two neural networks of the same structure but different roles, one is a train Q-network and one is a train Q-network. The two neural networks have different parameters thetatrainAnd thetatarget。θtrainQ value (expected reward), θ, for evaluating optimal actionstargetAn act of selecting the corresponding maximum Q value (by an epsilon-greedy algorithm). The two sets of parameters separate action selection and strategy evaluation, and reduce the risk of overfitting in the process of estimating the Q value. The experience pool is used for storing experiences generated by all the agents, and the experiences obtained by random sampling from the experience pool are used as the input of the train Q-network to update the parameters of the train Q-network, so that the memory and the computing resources required by training can be greatly reduced, and the coupling between data is reduced.
Further, as a preferred embodiment of the method, the step of constructing the mobile edge computing system based on the task unloading scene of one task initiator and a plurality of task performers in the industrial internet of things specifically includes:
s11, constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things, wherein the unloading task scene comprises a task initiator and a plurality of task executors;
s12, representing the dependency relationship between tasks by using an undirected acyclic graph G ═ { V, E }, which includes a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E ∈ W }ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in GijVariable indication v used as binary indicatoriAnd vjWhether there is data exchange between them;
specifically, to reflect the different topological relationships of graph tasks conveniently, we represent the dependencies between tasks as undirected acyclic graphs G ═ V, E, where the package isContaining a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E { (E) }ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in GijVariable indication v used as binary indicatoriAnd vjWhether there is data exchange between them. In addition, the parameter L ═ { L ═ LiI belongs to W and represents the data size of different tasks, and the calculation workload is determined by a parameter K ═ K ∈ WiI ∈ W } where K isiIndicating the execution of a task viThe amount of CPU cycles required. The graph task framework refers to FIG. 4.
In addition, we assume that each task performer is allocated a dedicated spectrum resource block during the transmission to support concurrent transmission of task offloading and downloading. Considering that the uplink transmission time is much longer than the downlink transmission time, only the consumption of uplink transmission is discussed herein, and given that task initiators are long-term suffering from resource shortages, it is assumed that the computationally intensive tasks would tend to be entirely offloaded to task performers.
Graph task offloading is performed in a mobile edge computing system, and the graph task offloading has transmission time consumption, execution time consumption and data exchange consumption.
Consumption of transmission time
Figure BDA0003208226570000051
Indicating that the task initiator offloads the task i to the edge device j in parallel, and the value of the task i is related to the conditions of channel condition, transmission power, bandwidth and the like. We use PTIIndicating a fixed power of uplink transmission from the task initiator,
Figure BDA0003208226570000052
indicating the channel gain for offloading task i to device j. Moreover, we assume that there is additive white gaussian noise with zero mean and equal variance σ 2 at the receiving end of all tasks. According to shannon's theorem, the uplink transmission rate of offloading task i from the task initiator to device j:
Figure BDA0003208226570000061
where B denotes a fixed bandwidth of an orthogonal channel allocated to an edge device. Therefore, the uplink transmission time of task i is:
Figure BDA0003208226570000062
we indicate the variable a in binaryijIndicating whether task i is offloaded to device j
Figure BDA0003208226570000063
Thus, the total time consumed for task transmission is:
Figure BDA0003208226570000064
execution time consumption: after the task is transferred to each idle slot, the edge device starts to execute each subtask in parallel. Let us use f ═ fiI is less than or equal to M is expressed as CPU frequency used by executing task on edge device, and the same f is adopted by idle interval for bearing calculation taski. Thus, the total time consumed for task execution is:
Figure BDA0003208226570000065
therefore, the total time consumed for task transmission and execution is:
T(a)=Ttrans(a)+Texec(a)
data exchange cost:
we assume that when a subtask with a join relationship is offloaded to a different task performer, it will incur a data exchange cost of cjj′(j ∈ m, j' ∈ m, i ≠ j), if the data is unloaded to the same task executor, the data exchange cost is not generated, which represents the cost generated by the traffic exchange between different task executors in the MEC system. We use binaryIndicating variable bjj′To indicate whether there is a data exchange between different task performers, then
Figure BDA0003208226570000066
Thus, the total data exchange consumption is:
Figure BDA0003208226570000067
modeling an optimization target: to obtain the offloading strategy for computationally intensive graph tasks in the considered MEC, we formulated the following optimization problem with the main performance of minimizing the sum of the weights of task completion time and data exchange consumption (WETC).
γ=αT(u)+(1-α)E(b)
Thus, the following optimization model was constructed:
Figure BDA0003208226570000071
s.t.(a)
Figure BDA0003208226570000072
(b)
Figure BDA0003208226570000073
(c)
Figure BDA0003208226570000074
(d)
Figure BDA0003208226570000075
(e)
Figure BDA0003208226570000076
(f)
Figure BDA0003208226570000077
the following six constraints are simultaneously satisfied:
(1)
Figure BDA0003208226570000078
representing a task viWhether or not to assign to device nj
(2)
Figure BDA0003208226570000079
Indicating that all tasks in the task graph need to be distributed to the idle gaps of the relevant task performers for performing;
(3)
Figure BDA00032082265700000710
ensuring that tasks assigned to the same task performer cannot exceed maximum computational resources
(4)
Figure BDA00032082265700000711
The CPU frequency of each task executor is limited to a certain interval;
(5)
Figure BDA00032082265700000712
the method comprises the following steps of representing that the distance between a task initiator and a task performer is limited;
(6)
Figure BDA00032082265700000713
the computational resources representing each task performer fluctuate randomly over an interval.
Further as a preferred embodiment of the method, an algorithm based on a combination of a Deep Q Network (DQN) and breadth-first traversal (BFS) is adopted to learn to offload computation-intensive graph tasks in a dynamic time-varying environment. The reinforcement learning of the pre-constructed deep Q network is as follows:
state space, as we mentionIn the outgoing DRL framework, agent will monitor the environment and record the system status at time intervals. At time t, we represent the state at this time as
Figure BDA00032082265700000714
Wherein
Figure BDA00032082265700000715
Representing the channel gain of the task performer i at time t, ft={ft,iI belongs to m represents the cpu frequency of the task performer i at the time t, ut={ut,iI belongs to m represents the idle gaps quantity of the task performer i at the time t, GtTopological relations representing a task graph, dt={dt,iI belongs to m represents the distance between the task executor i and the task initiator at the time t;
action space, when obtaining environment feedback state, the agent will according to the observed situation and the current task viAnd selecting the most suitable unloading strategy according to the output of the Q network. Thus, the current task viCan be represented as ai{ai,1,ai,2,...ai,mI ∈ m }, where ai,jIs set to a binary indicator, i.e. ai,jDenoted 1 as selection device njTo unload task vi,ai,j0 denotes no selected device njTo unload task vi
Reward function, the optimization objective of the above considerations is to produce minimization of WETC with graph task offload in large scale scenarios facing random transformations, so we set the reward of the system to be
Figure BDA0003208226570000081
Where t (u) represents time consumption, e (b) represents data exchange consumption, and α and (1- α) represent weights of time consumption and data exchange consumption, respectively. The aim of minimum WETC is achieved by maximizing r, and finally the task initiator finds the optimal unloading scheme.
Further, as a preferred embodiment of the method, the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in the mobile edge computing system and an optimization goal of graph task offloading, and learning to offload a computation-intensive graph task in a dynamic time-varying environment to obtain an optimal action specifically includes:
s31, calculating the time consumption and the data exchange consumption in the dynamic environment according to the environment state in the mobile edge computing system and the optimization target of graph task unloading;
s32, determining a return r corresponding to the action a, inputting a new observed environment state S' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer mode;
s33, repeating the step S32 until the return r is judged to be converged and approaches to the maximization, and taking the current action as the optimal action.
Further, as a preferred embodiment of the method, the step of determining the return r corresponding to the action a, inputting a new observed environment state s' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and further updating parameters of the train Q-network in a reverse gradient transfer manner specifically includes:
initializing graph task G ═ { V, E }, empirical playback pool, parameter θ of train-Q networktrainParameter theta of target-Q networktargetAnd a system environment, and setting an empty queue Q;
random selection of a task v by a task initiatoriTaking the queue as a head node of the queue and carrying out enqueue operation;
the task initiator sequentially takes out the tasks v to be unloaded from the queueiAnd according to the edge set E in the graph task G, the graph task V and the task V are sequentially combinediAll the associated tasks which are not unloaded carry out enqueue operation;
inputting the state of the current observed environment of the task initiator into a target Q-network, and outputting Q { s, a | theta }a∈ASelecting action a according to an epsilon-greedy algorithmiAnd then unloadedA task;
specifically, the epsilon-greedy algorithm:
Figure BDA0003208226570000091
task initiator observes state s from environmenti+1And reward return riAnd will(s)i,ai,ri,si+1) Accessing as experience values into an experience playback pool;
judging that the experience playback pool is full of data, and randomly selecting K experience values from the experience playback pool;
calculating a target value and combining an empirical value, and updating a parameter theta of the train-Q network based on a gradient descent methodtrain
After F time steps, according to the parameter theta of the train Q-network at the current momenttrainUpdating parameter theta of target Q-networktarget
As shown in fig. 2, an industrial internet of things graph task unloading system based on deep reinforcement learning includes:
the system model building module is used for building a mobile edge computing system based on task unloading scenes of one task initiator and a plurality of task performers in the industrial Internet of things;
the optimization target setting module is used for setting an optimization target for task unloading of the graph based on the mobile edge computing system, wherein the optimization target is the sum of the minimum task completion time and the weight consumed by data exchange;
and the reinforcement learning module is used for carrying out reinforcement learning based on a pre-constructed deep Q network according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. An industrial Internet of things graph task unloading method based on deep reinforcement learning is characterized by comprising the following steps:
s1, constructing a mobile edge computing system based on unloading task scenes in the industrial Internet of things;
s2, setting an optimization target for graph task unloading based on the mobile edge computing system, wherein the optimization target is the sum of the weight of minimized task completion time and data exchange consumption;
s3, according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, performing reinforcement learning based on a pre-constructed deep Q network, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action;
the pre-constructed deep Q network comprises a train Q-network and a target Q-network which have the same structure.
2. The deep reinforcement learning-based industrial internet of things graph task offloading method according to claim 1, wherein the step of constructing the mobile edge computing system based on offloading task scenes in the industrial internet of things specifically includes:
s11, constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things, wherein the unloading task scene comprises a task initiator and a plurality of task executors;
s12, representing the dependency relationship between tasks by using an undirected acyclic graph G ═ { V, E }, which includes a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E ∈ W }ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in GijVariable indication v used as binary indicatoriAnd vjWhether there is data exchange between them;
graph task offloading is performed in a mobile edge computing system, and the graph task offloading has transmission time consumption, execution time consumption and data exchange consumption.
3. The deep reinforcement learning-based industrial internet of things graph task unloading method according to claim 2, wherein the specific steps of reinforcement learning based on the pre-constructed deep Q network comprise:
state space, the state at time t being represented as
Figure FDA0003208226560000011
Wherein
Figure FDA0003208226560000012
Representing the channel gain of the task performer i at time t, ft={ft,iI belongs to m represents the cpu frequency of the task performer i at the time t, ut={ut,iI belongs to m represents the number of free gaps of the task performer i at the moment t, GtRepresenting topological relations of the task graph, dt={dt,iI belongs to m represents the distance between the task executor i and the task initiator at the time t;
action space, current task viIs represented by ai{ai,1,ai,2,…ai,mI ∈ m }, where ai,jSet to a binary indicator;
reward function, reward setting of the system
Figure FDA0003208226560000013
Where t (u) represents time consumption, e (b) represents data exchange consumption, and α and (1- α) represent weights of time consumption and data exchange consumption, respectively.
4. The deep reinforcement learning-based industrial internet of things task offloading method according to claim 3, wherein the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in a mobile edge computing system and an optimization goal of graph task offloading, learning to offload computation-intensive graph tasks in a dynamic time-varying environment, and obtaining an optimal action specifically comprises:
s31, calculating the time consumption and the data exchange consumption in the dynamic environment according to the environment state in the mobile edge computing system and the optimization target of graph task unloading;
s32, determining a return r corresponding to the action a, inputting a new observed environment state S' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer mode;
s33, repeating the step S32 until the return r is judged to be converged and approaches to the maximization, and taking the current action as the optimal action.
5. The deep reinforcement learning-based industrial internet of things graph task offloading method according to claim 4, wherein the step of determining the reward r corresponding to the action a, inputting a new observed environment state s' into a pre-constructed deep Q network, calculating a loss function loss by using the reward r, and further updating parameters of a train Q-network in a reverse gradient transfer manner specifically comprises:
initializing graph task G ═ { V, E }, empirical playback pool, parameter θ of train-Q networktrainParameter theta of target-Q networktargetAnd a system environment, and setting an empty queue Q;
random selection of a task v by a task initiatoriTaking the queue as a head node of the queue and carrying out enqueue operation;
the task initiator sequentially takes out the tasks v to be unloaded from the queueiAnd according to the edge set E in the graph task G, the graph task V and the task V are sequentially combinediAll the associated tasks which are not unloaded carry out enqueue operation;
inputting the state of the current observed environment of the task initiator into a target Q-network, outputting Q s,a|θ}a∈Aselecting action a according to an epsilon-greedy algorithmiThen, unloading tasks are carried out;
task initiator observes state s from environmenti+1And reward return riAnd will(s)i,ai,ri,si+1) Accessing as experience values into an experience playback pool;
judging that the experience playback pool is full of data, and randomly selecting K experience values from the experience playback pool;
calculating a target value and combining an empirical value, and updating a parameter theta of the train-Q network based on a gradient descent methodtrain
After F time steps, according to the parameter theta of the train Q-network at the current momenttrainUpdating parameter theta of target Q-networktarget
6. The utility model provides an industry thing networking map task uninstallation system based on deep reinforcement learning which characterized in that includes:
the system model building module is used for building a mobile edge computing system based on unloading task scenes in the industrial Internet of things;
the optimization target setting module is used for setting an optimization target for task unloading of the graph based on the mobile edge computing system, wherein the optimization target is the sum of the minimum task completion time and the weight consumed by data exchange;
and the reinforcement learning module is used for carrying out reinforcement learning based on a pre-constructed deep Q network according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action.
CN202110923267.8A 2021-08-12 2021-08-12 Industrial Internet of things graph task unloading method and system based on deep reinforcement learning Active CN113590229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110923267.8A CN113590229B (en) 2021-08-12 2021-08-12 Industrial Internet of things graph task unloading method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110923267.8A CN113590229B (en) 2021-08-12 2021-08-12 Industrial Internet of things graph task unloading method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113590229A true CN113590229A (en) 2021-11-02
CN113590229B CN113590229B (en) 2023-11-10

Family

ID=78257430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110923267.8A Active CN113590229B (en) 2021-08-12 2021-08-12 Industrial Internet of things graph task unloading method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113590229B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050210473A1 (en) * 2004-03-08 2005-09-22 Frank Inchingolo Controlling task execution
US20190394096A1 (en) * 2019-04-30 2019-12-26 Intel Corporation Technologies for batching requests in an edge infrastructure
WO2020119648A1 (en) * 2018-12-14 2020-06-18 深圳先进技术研究院 Computing task unloading algorithm based on cost optimization
CN111726826A (en) * 2020-05-25 2020-09-29 上海大学 Online task unloading method in base station intensive edge computing network
CN111835827A (en) * 2020-06-11 2020-10-27 北京邮电大学 Internet of things edge computing task unloading method and system
CN112616152A (en) * 2020-12-08 2021-04-06 重庆邮电大学 Independent learning-based mobile edge computing task unloading method
WO2021139537A1 (en) * 2020-01-08 2021-07-15 上海交通大学 Power control and resource allocation based task offloading method in industrial internet of things
CN113157344A (en) * 2021-04-30 2021-07-23 杭州电子科技大学 DRL-based energy consumption perception task unloading method in mobile edge computing environment
CN113225377A (en) * 2021-03-30 2021-08-06 北京中电飞华通信有限公司 Internet of things edge task unloading method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050210473A1 (en) * 2004-03-08 2005-09-22 Frank Inchingolo Controlling task execution
WO2020119648A1 (en) * 2018-12-14 2020-06-18 深圳先进技术研究院 Computing task unloading algorithm based on cost optimization
US20190394096A1 (en) * 2019-04-30 2019-12-26 Intel Corporation Technologies for batching requests in an edge infrastructure
WO2021139537A1 (en) * 2020-01-08 2021-07-15 上海交通大学 Power control and resource allocation based task offloading method in industrial internet of things
CN111726826A (en) * 2020-05-25 2020-09-29 上海大学 Online task unloading method in base station intensive edge computing network
CN111835827A (en) * 2020-06-11 2020-10-27 北京邮电大学 Internet of things edge computing task unloading method and system
CN112616152A (en) * 2020-12-08 2021-04-06 重庆邮电大学 Independent learning-based mobile edge computing task unloading method
CN113225377A (en) * 2021-03-30 2021-08-06 北京中电飞华通信有限公司 Internet of things edge task unloading method and device
CN113157344A (en) * 2021-04-30 2021-07-23 杭州电子科技大学 DRL-based energy consumption perception task unloading method in mobile edge computing environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
于博文,等: "移动边缘计算任务卸载和基站关联协同决策问题研究", 《计算机研究与发展》, vol. 55, no. 03, pages 537 - 550 *
卢海峰;顾春华;罗飞;丁炜超;杨婷;郑帅;: "基于深度强化学习的移动边缘计算任务卸载研究", 计算机研究与发展, no. 07, pages 195 - 210 *

Also Published As

Publication number Publication date
CN113590229B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN111756812B (en) Energy consumption perception edge cloud cooperation dynamic unloading scheduling method
Asheralieva et al. Hierarchical game-theoretic and reinforcement learning framework for computational offloading in UAV-enabled mobile edge computing networks with multiple service providers
CN111226238B (en) Prediction method, terminal and server
CN108958916B (en) Workflow unloading optimization method under mobile edge environment
CN112188442A (en) Vehicle networking data-driven task unloading system and method based on mobile edge calculation
CN111093203A (en) Service function chain low-cost intelligent deployment method based on environment perception
CN112148492B (en) Service deployment and resource allocation method considering multi-user mobility
CN114340016A (en) Power grid edge calculation unloading distribution method and system
CN113660325B (en) Industrial Internet task unloading strategy based on edge calculation
CN116541106B (en) Computing task unloading method, computing device and storage medium
CN113867843A (en) Mobile edge computing task unloading method based on deep reinforcement learning
Gupta et al. Toward intelligent resource management in dynamic Fog Computing‐based Internet of Things environment with Deep Reinforcement Learning: A survey
CN111158893B (en) Task unloading method, system, equipment and medium applied to fog computing network
CN113821346A (en) Computation uninstalling and resource management method in edge computation based on deep reinforcement learning
Murti et al. Learning-based orchestration for dynamic functional split and resource allocation in vRANs
CN116954866A (en) Edge cloud task scheduling method and system based on deep reinforcement learning
Li et al. Efficient data offloading using markovian decision on state reward action in edge computing
CN113590229B (en) Industrial Internet of things graph task unloading method and system based on deep reinforcement learning
CN114693141B (en) Transformer substation inspection method based on end edge cooperation
Zhang et al. EFFECT-DNN: Energy-efficient Edge Framework for Real-time DNN Inference
CN115220818A (en) Real-time dependency task unloading method based on deep reinforcement learning
CN116069498A (en) Distributed computing power scheduling method and device, electronic equipment and storage medium
CN113220369B (en) Intelligent computing unloading optimization method based on distributed machine learning
CN114968402A (en) Edge calculation task processing method and device and electronic equipment
CN113900739A (en) Calculation unloading method and system under many-to-many edge calculation scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant