CN117834643B - Deep neural network collaborative reasoning method for industrial Internet of things - Google Patents

Deep neural network collaborative reasoning method for industrial Internet of things Download PDF

Info

Publication number
CN117834643B
CN117834643B CN202410246171.6A CN202410246171A CN117834643B CN 117834643 B CN117834643 B CN 117834643B CN 202410246171 A CN202410246171 A CN 202410246171A CN 117834643 B CN117834643 B CN 117834643B
Authority
CN
China
Prior art keywords
reasoning
task
edge
user equipment
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410246171.6A
Other languages
Chinese (zh)
Other versions
CN117834643A (en
Inventor
郭永安
奚城科
王宇翱
周金粮
钱琪杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202410246171.6A priority Critical patent/CN117834643B/en
Publication of CN117834643A publication Critical patent/CN117834643A/en
Application granted granted Critical
Publication of CN117834643B publication Critical patent/CN117834643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The invention belongs to the technical field of industrial Internet of things and deep neural networks, and discloses a deep neural network collaborative reasoning method for the industrial Internet of things, which uses a DQN algorithm and an LSTM algorithm to carry out efficient dynamic unloading on reasoning tasks of the industrial Internet of things, wherein the LSTM algorithm has memory capacity on long-term states and can more accurately estimate the current states of a current server and user equipment, thereby ensuring the accuracy of the algorithm and improving the reasoning efficiency. The method can more effectively utilize the computing capacity of the edge server and improve the efficiency of the deep neural network reasoning; meanwhile, the delay of the reasoning task can be reduced, the energy consumption can be reduced, and the optimal reasoning method can be dynamically selected according to the change of the environment.

Description

Deep neural network collaborative reasoning method for industrial Internet of things
Technical Field
The invention belongs to the field of industrial Internet of things, and particularly relates to a deep neural network collaborative reasoning method for the industrial Internet of things.
Background
With the rapid development of the internet of things in recent years, the number of internet of things devices has increased dramatically, and various novel applications, such as augmented reality and image processing, have also increased. Deep Neural Network (DNN) reasoning is a computationally intensive task, with higher requirements on DNN reasoning in terms of latency in industrial internet of things scenarios, whereas internet of things devices with limited computational resources have difficulty in meeting efficient reasoning of DNN tasks. While cloud computing can alleviate this problem to some extent, there are still limitations in terms of latency and power consumption. Edge computing is an effective way to break this bottleneck, offloading DNN inference tasks to edge servers for efficient execution. Network delay and transmission costs can be reduced if appropriate offloading methods are employed in edge computation. Therefore, how to effectively realize the offloading of the task of deep neural network reasoning in the industrial internet of things becomes the key point of current research.
In the prior art, some methods for unloading DNN reasoning tasks exist; for example, the reasoning task can be offloaded to cloud reasoning by selecting the optimal partition point; an early exit mechanism can also be adopted to meet the low-delay calculation requirement of the user; however, the following problems still exist in the prior art:
1) The existing DNN reasoning method often ignores the waiting time delay of tasks in each time slot when the equipment is busy, and has larger deviation from the actual reasoning time delay;
2) In an industrial Internet of things scene, the task processing efficiency is limited by the limited battery capacity, and the energy consumption problem is not considered in the existing method;
3) The existing method only focuses on the condition that one server corresponds to one or more devices, ignores the complex condition that a plurality of servers correspond to a plurality of devices in practice, and increases time delay and energy consumption while adding load because each server needs to make a decision.
Disclosure of Invention
In order to solve the technical problems, the invention provides a deep neural network collaborative reasoning method for an industrial Internet of things, which is characterized in that a framework for decision taking by an edge center node is added in a multi-equipment multi-server scene, a problem model for combined optimization of time delay and energy consumption is constructed, and the reasoning time delay and the energy consumption are reduced.
The invention discloses a deep neural network collaborative reasoning method for industrial Internet of things, which comprises the following steps:
Step 1, acquiring data of a user equipment layer and an edge cluster layer, transmitting the acquired data to an edge node center of the edge cluster layer, and preprocessing the acquired data;
Step 2, after the user equipment layer generates the reasoning task, sending a reasoning request to an edge center node, and deciding whether the reasoning task needs to be unloaded to an edge server or not by the edge center node; if the decision is no, turning to the step 3; if yes, turning to step 4, and confirming the number of layers to be unloaded to the appointed edge server;
Step 3, when the edge center node decides to locally unload, the user equipment layer calculates all reasoning tasks, returns the result to the user, and goes to step 5;
step 4, when the edge center node decides to completely unload or partially unload, the uncomputed reasoning task enters a certain queue of the edge center node, after waiting for the corresponding edge server of the current queue to be idle, the reasoning task is transmitted to the edge server for calculation, and the processing result is returned to the user;
and 5, evaluating the performance index processed by the reasoning task by the edge equipment, and optimizing the layering unloading strategy according to the performance index evaluation result.
Further, the user equipment layer comprises a plurality of user equipment which are responsible for continuously collecting industrial data from the production line, wherein the set of the user equipment is thatEach user equipment/>Is/>Represented by floating point operand per second FLOPs; the length of one slot is defined as/>Second, in time slot/>Randomly generating DNN reasoning tasks by the user equipment; with one tuple/>To represent each DNN inference task,/>Representing a generating task/>Apparatus of/>Representing task/>Data size of,/>Representing task/>Arriving time slots.
Further, the edge cluster layer includes a plurality of edge servers, defines the edge server closest to the user equipment in the geographic position as an edge center node, and sets the rest of edge servers asEach edge server/>Is/>; The edge center node is responsible for collecting DNN reasoning tasks offloaded from the user equipment layer and reasonably offloads the DNN reasoning tasks to the edge server/>
Further, the task queueThe three states are idle, occupied, unoccupied and uploaded respectively;
Queues In time slot/>State of time/>The following are provided:
The corresponding edge server has two states, namely an idle state and a busy state; edge server In time slot/>State of time/>The following are provided:
Further, in step 2, the task The offloading decision result of (a) is the task offloading rate/>The following are provided:
Wherein, Representing task/>Front/>, corresponding to the inference modelLayer/>Representing task/>Total number of inference layers;
Local offloading: tasks Reasoning locally at the user equipment;
Global unloading: tasks Unloading to an edge server for reasoning;
Partial unloading: tasks Reasoning/>, on user equipmentAnd unloading the rest part to an edge server for reasoning.
Further, the performance indexes of reasoning on the server include time delay and energy consumption, and the performance indexes are as follows:
1) Tasks At the user equipment/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:
Wherein, Representing task/>In reasoning of the/>Size of data volume at layer,/>For user equipment/>In a single time slot/>Computing power in,/>For user equipment/>Calculating the power unit required to be consumed;
2) User equipment Maximum uplink transmission rate to edge center node/>Obtained by the shannon formula:
Wherein, Representing user equipment/>In time slot/>Transmission power of/>Representing user equipment/>Radio channel gain to edge center node,/>Representing infinite channel bandwidth,/>Representing gaussian noise power spectral density;
when the inference task is offloaded to the edge-centric node, its propagation delay is delayed And transport energy consumption/>The following are provided:
Wherein the method comprises the steps of Representing task/>Complete reasoning of/>The data size after the layer calculation result;
when the user equipment generates tasks, a request signal is sent to the edge center node, and then each edge server traversed by the edge center node is in the current time slot Status of time/>And the delay required by the calculation of the current task of the edge server, namely waiting delay/>The following are provided:
Wherein the method comprises the steps of Representing edge servers/>Size of remaining unprocessed data,/>Representing edge servers/>In a single time slot/>Computing power within;
Thereby knowing the user equipment Processing task/>Queuing delay required at the current slot/>The following are provided:
Wherein, Representing task/>Queue sequence number where queuing is located and edge server/>, corresponding to minimum latencyThe serial numbers of the (2) are consistent;
3) After queuing is finished, tasks Will finish processing the/>, through the edge-centric nodeThe layer calculation result is transferred to the AND queue/>In the corresponding edge server, carrying out residual layer reasoning; when task/>At edge server/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:
Wherein, Representing edge servers/>Calculating the power unit required to be consumed;
to further reduce latency, in the actual reasoning process, in the task At queue/>While queuing in the middle, task/>DNN reasoning can still be performed at the user equipment layer or tasks can be transmitted from the user equipment to the edge center node; thus the sum of the delays of the three parts/>The following are provided:
Then the task Total inferred time delay/>And total energy consumption/>Expressed as:
according to the above expression, the utility function is expressed as:
Wherein the method comprises the steps of For overhead,/>Is time delay weight,/>Is the energy consumption weight.
Further, the energy consumption and time-consuming tasks are jointly optimized, and the optimization problem is expressed as follows:
C1 ensures that the linear sum of the system delay weight and the energy consumption weight is 1, Is the current task/>Maximum tolerable delay of (2); c2 ensures that the total delay must not be greater than the maximum tolerated delay of the task; c3 guaranteed user equipment/>Must not be greater than its maximum available energy consumption; c4 guaranteed edge server/>Must not be greater than its maximum available energy consumption; />, in C5For task/>Including partition points/>, whose corresponding DNN model reasoningIs selected; by optimizing variablesAnd/>To make the system overhead/>Minimum.
Further, the optimization problem is solved by adopting an improved DRLLU algorithm, specifically:
Step of the current time Observed state/>And previous time step/>Action/>Composing a state-action pair and integrating the state-action pair with an output value in the LSTM to obtain a real environment state/>Then leading into a deep neural network for training; use/>Performing a function fit, wherein/>Output value representing LSTM layer at current time step,/>Is a parameter; the iterative formula is as follows:
The beneficial effects of the invention are as follows: the method can unload DNN reasoning tasks to the edge server, reduce task reasoning and waiting delay, and simultaneously reduce energy consumption of user equipment and a cloud data center; the DNN reasoning task unloading strategy and the resource dynamic allocation method based on the DQN algorithm and the LSTM algorithm are used, the last full-connection layer of the DQN network is replaced by the LSTM layer to form an improved DRLLU algorithm, and the reasoning task can be intelligently unloaded to an edge server for calculation, so that the resources of edge calculation in the industrial Internet of things are utilized more efficiently, and the resource utilization rate is improved; the method can intelligently learn the optimal reasoning task unloading strategy according to the real-time environment and the reasoning task demands, and can dynamically adjust according to the environment, so that unloading decisions are further optimized.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a system model diagram of the present invention;
FIG. 3 is a diagram of an algorithm architecture of the present invention;
FIG. 4 is a graph showing average rewards obtained by the algorithm according to the invention under different learning rates as a function of step size;
FIG. 5 is a graph of average reasoning delay and average reasoning energy consumption comparison when DNN reasoning is performed by different algorithms;
fig. 6 is a graph of average overhead variation under weight for the algorithm provided by the present invention.
Detailed Description
In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
As shown in fig. 1, the method for collaborative reasoning of the deep neural network for the industrial internet of things provided by the invention comprises the following steps:
Step 1, acquiring data of a user equipment layer and an edge cluster layer, transmitting the acquired data to an edge node center of the edge cluster layer, and preprocessing the acquired data;
Step 2, after the user equipment layer generates the reasoning task, sending a reasoning request to an edge center node, and deciding whether the reasoning task needs to be unloaded to an edge server or not by the edge center node; if the decision is no, turning to the step 3; if yes, turning to step 4, and confirming the number of layers to be unloaded to the appointed edge server;
Step 3, when the edge center node decides to locally unload, the user equipment layer calculates all reasoning tasks, returns the result to the user, and goes to step 5;
step 4, when the edge center node decides to completely unload or partially unload, the uncomputed reasoning task enters a certain queue of the edge center node, after waiting for the corresponding edge server of the current queue to be idle, the reasoning task is transmitted to the edge server for calculation, and the processing result is returned to the user;
and 5, evaluating the performance index processed by the reasoning task by the edge equipment, and optimizing the layering unloading strategy according to the performance index evaluation result.
As shown in fig. 2, the present invention contemplates an industrial internet of things system comprising multiple devices and multiple servers, the system comprising a user device layer and an edge cluster layer.
1) User equipment layer: the user equipment layer consists of a large number of devices associated with the industrial production line responsible for collecting industrial data. Such as an industrial camera, an industrial scanner, a pressure sensor, etc. deployed locally. Is responsible for continuously collecting industrial data from the production line. The invention uses the collectionTo represent these devices. Each user equipmentIs/>Expressed in floating point Operands Per Second (FLOPs). The invention sets the length of one time slot in the discrete time model as/>Second. In time slot/>The user equipment randomly generates DNN inference tasks. The invention uses a tuple/>To represent each DNN inference task/>;/>Representing a generating task/>Apparatus of/>Representing task/>Data size of,/>Representing task/>Arriving time slots; each DNN task can be locally inferred, offloaded to the edge server inference, and uploaded to the upper server inference remaining layers in the first few layers of local inferences. Due to the layering nature of DNN inference models, for task/>The corresponding DNN model layer number is set/>, and the method uses the set/>Expressed by/>Representing task/>Reasoning of the/>The data size of the layer. In addition, in DNN reasoning process, the calculation result of each layer is further inferred as the input of the next layer,/>Representing task/>Complete reasoning of/>Data size of layer calculation result, wherein/>Time,/>. The invention deploys DNN models for processing corresponding reasoning tasks for each user equipment.
2) Edge cluster layer: an edge cluster layer contains several edge servers, all deployed near the industrial line. The invention refers to the edge server closest to the user equipment in geographic position as an edge center node, and the rest edge servers are responsible for DNN reasoning, so that the edge cluster layer consists of one edge center node and a plurality of edge servers. For the inventionTo represent these edge servers. Each edge server/>Is/>Each edge server is deployed with a DNN model for processing. The edge center node is responsible for collecting DNN reasoning tasks offloaded from the user equipment layer and reasonably offloads the DNN reasoning tasks to the edge server/>. Due to presence/>The edge servers, and thus the edge center nodes, are equipped with the same number of task queues to manage DNN inference tasks from the user equipment layer, the present invention uses/>Representing these queues. Queue/>The three states are idle, occupied, unoccupied and uploaded, respectively. Queue/>In time slot/>State of time/>The following are provided:
there are also two states, an idle state and a busy state, for the edge servers to which they correspond. Edge server In time slot/>State of time/>The following are provided:
As shown in fig. 3, the objective of the present invention is to minimize the overall delay of DNN reasoning over a long period of time, while minimizing the overall energy consumption of the user equipment and edge servers. From the above preliminary analysis, an expression of the utility function can be deduced.
The DNN reasoning time delay consists of a user equipment side processing time delay, an edge server side processing time delay, a user equipment to edge center node transmission time delay, an edge center node to edge server transmission time delay, a transmission time delay for the edge server to transmit the processing result back to the edge center node, a transmission time delay for the edge center node to further transmit the result back to the user equipment, and a queue waiting time delay in the edge center node.
Considering that the network bandwidth between the edge center node and the edge server is large, the transmission rate is high, so the transmission delay between the edge center node and the edge server is negligible. Furthermore, DNN infers that the resulting data size from the last layer is very small, so the transmission delay during the return of the result to the user equipment through the edge-centric node is also negligible.
Therefore, in terms of time delay, the invention only needs to pay attention to 4 parts of processing time delay of the user equipment end, processing time delay of the edge server end, transmission time delay from the user equipment to the edge center node and queue waiting time delay in the edge center node. Likewise, in terms of energy consumption, the present invention also only needs to consider the following 3 aspects: the energy consumption of the user equipment, the energy consumption of the edge server and the energy consumption of the transmission from the user equipment to the edge center node.
TasksThe offloading decision result of (a) is the task offloading rate/>The following are provided:
Wherein, Representing task/>Front/>, corresponding to the inference modelLayer/>Representing task/>Total number of inference layers;
Local offloading: tasks Reasoning locally at the user equipment;
Global unloading: tasks Unloading to an edge server for reasoning;
Partial unloading: tasks Reasoning/>, on user equipmentAnd unloading the rest part to an edge server for reasoning.
TasksAt the user equipment/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:
Wherein, Representing task/>In reasoning of the/>Size of data volume at layer,/>For user equipment/>In a single time slot/>Computing power in,/>For user equipment/>The power units to be consumed are calculated.
The user equipment adopts Orthogonal Frequency Division Multiple Access (OFDMA) to transmit tasks to the edge center node, and different user equipment occupies different channels for transmission. User equipmentMaximum uplink transmission rate to edge center node/>From shannon's formula:
Wherein, Representing user equipment/>In time slot/>Transmission power of/>Representing user equipment/>Radio channel gain to edge center node,/>Representing infinite channel bandwidth,/>Representing gaussian noise power spectral density.
When the inference task is offloaded to the edge-centric node, its propagation delay is delayedAnd transport energy consumption/>The following are provided:
Wherein the method comprises the steps of Representing task/>Complete reasoning of/>The amount of data after the layer calculation results.
When the user equipment generates tasks, a request signal is sent to the edge center node, and then each edge server traversed by the edge center node is in the current time slotStatus of time/>And the delay required by the calculation of the current task of the edge server, namely waiting delay/>The following are provided:
Wherein the method comprises the steps of Representing edge servers/>Size of remaining unprocessed data,/>Representing edge servers/>In a single time slot/>Computing power within.
Thereby knowing the user equipmentProcessing task/>Queuing delay required at the current slot/>The following are provided:
Wherein, Representing task/>Queue sequence number where queuing is located and edge server/>, corresponding to minimum latencyIs identical in sequence number.
After queuing is finished, tasksWill finish processing the/>, through the edge-centric nodeThe layer calculation result is transferred to the AND queueIn the corresponding edge server, carrying out residual layer reasoning; when task/>At edge server/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:
Wherein, Representing edge servers/>The power units to be consumed are calculated.
To further reduce latency, in the actual reasoning process, in the taskAt queue/>While queuing in the middle, task/>DNN reasoning can still be performed at the user equipment layer or tasks can be transmitted from the user equipment to the edge center node; thus the sum of the delays of the three parts/>The following are provided:
Then the task Total inferred time delay/>And total energy consumption/>Expressed as:
according to the above expression, the utility function is expressed as:
Wherein the method comprises the steps of For overhead,/>Is time delay weight,/>Is the energy consumption weight.
The invention aims to minimize the utility function proposed by the formula, so as to jointly optimize the energy consumption and the task delay of the system; the key to the problem is obviously the choice of the best partition point and edge server for the DNN inference task. The optimization problem of the present invention can be expressed as:
C1 ensures that the linear sum of the system delay weight and the energy consumption weight is 1, Is the current task/>Maximum tolerable delay of (2); c2 ensures that the total delay must not be greater than the maximum tolerated delay of the task; c3 guaranteed user equipment/>Must not be greater than its maximum available energy consumption; c4 guaranteed edge server/>Must not be greater than its maximum available energy consumption; />, in C5For task/>Including partition points/>, whose corresponding DNN model reasoningIs selected; the invention is realized by optimizing the variable/>And/>To make the system overhead/>Minimum.
The optimization problem is a discrete decision problem, and thus the invention converts it into a markov offload decision (MDP) problem, which can be solved by the DQN algorithm. Fig. 3 is combined with an actual environment of industrial internet of things side-end coordination to construct an MDP model, and an DQN algorithm is improved, so that the problem of reasoning task unloading in an edge center node is solved.
MDP model:
1) State space
The agent needs to gather information about DNN inference tasks, user device queuing queues, and edge server status to optimize its scheduling in the edge computing environment. This information is contained within the state space, necessary to achieve optimization goals and constraints. The invention steps the timeThe state space at is defined as/>Wherein/>Representing the data volume of the task of the current layer of reasoning; /(I)Indicating the result data quantity generated by the current layer after reasoning; /(I)Representing task/>Partitioning points corresponding to DNN reasoning models; /(I)Represents the/>Individual user equipment at current step/>The uplink transmission rate at that time; /(I)Represents the/>Individual user equipment at time step/>Power at that time; /(I)Represents the/>The individual edge servers are at time step/>Power at that time; The representation represents the/> The invention sets the initial value of the state to 0 for the delay that the queues need to wait.
2) Action space
In the algorithm model of the invention, the actions of the DQN include partition point selection and edge server selection. At each time slotIn which an action may be represented as a vector/>Representing task/>(1 /)Inference after layer offloads to the/>And the edge servers. In order to limit the action space, the invention limits the unloading layer number of the edge server side to/>Within the inner part.
3) Reward function
The reward function is set to a negative value of the utility function. The agent will seek/>And rewards after each action taken; however, the too small number of layers inferred at the ue end can lead to the time-consuming process of queuing, and conversely, the too large number of layers inferred at the ue end can far exceed the queuing delay, so that the energy consumption at the ue end is too large, and the total inference delay can be prolonged. To solve this bad result, the present invention adjusts the reward function by considering the difference between the sum of the user equipment side reasoning delay and transmission delay and queuing delay. To this end, the invention sets a penalty function/>The method comprises the following steps:
According to penalty function The present invention can adjust the bonus function R to:
optimization based on LSTM algorithm:
DQN is a value iteration based deep reinforcement learning algorithm with the goal of estimating the optimal strategy Values. The algorithm calculates an approximate function by using a deep neural network, and converts the Q-Table updating problem into a function fitting problem, so that similar output actions can be obtained according to the current state, and the defect of the traditional Q-Learning algorithm in high-dimensional continuous problems is overcome. By updating the parameters/>, as shown in the following formulaFunction/>Is approximated as/>Value:
in the method, in the process of the invention, Expressed in time step/>Take action/>The next state after,/>Representing taking action/>Instant rewards after,/>State/>All actions that can be taken are taken; /(I)Is a discount coefficient in the value accumulation process; /(I)Is the learning rate.
The DQN not only improves the searching speed of the Q-Learning algorithm through function fitting, but also improves the diversity and the stability of the Q-Learning algorithm through adding an experience pool and a target network. The experience pool interacts the agent with the environment to obtain migration samples at each time stepAnd storing the data in an experience pool buffer. During training, a certain number of samples are randomly selected, so that the problems of data correlation and non-static distribution are solved, and the target network value/>Refers to the goal/>, of a training process generated using a goal networkValues. The structure of the target network is consistent with the neural network main network of the DQN, and the parameters of the main network are shown as/>And copying the iteration of the round into a target network. Thus, the current/>Values and targets/>The value is kept for a period of time, and a loss function is calculatedThen, the main network parameters are inversely updated by utilizing a random gradient descent method (SGD);
Wherein the method comprises the steps of ,/>The output of the primary network and the output of the target network, respectively.
In a real industrial internet of things environment, the problem is more complex, resulting in limited system perception. Assuming that the state information of the system is partially known, then the real environment state is not well reflected when the observations are. At this point, it is difficult for the DQN to directly solve the actual reasoning offloading problem. Considering the gradual change of the states of the edge server and the user equipment along with the time and the memory capacity of the LSTM network for the long-term state, the invention integrates the long-term history data by utilizing a circulating structure by replacing the last full-connection layer of the DQN network with the LSTM layer, thereby more accurately estimating the current state. As shown in FIG. 3, the DRLLU algorithm modified from the DQN algorithm is run from the current time stepObserved state/>And previous time step/>Action/>Composing a state-action pair and integrating the state-action pair with an output value in the LSTM to obtain a real environment state/>Then the deep neural network is imported for training. Thus, relative to the/>, used by the DQN algorithmDRLLU is more prone to use/>Performing a function fit, wherein/>Representing the output value of the LSTM layer at the current time step. The iterative formula is as follows:
As shown in fig. 4-6, the present invention utilizes tensorflow to simulate the problem of unloading DNN model reasoning in industrial internet of things, and verifies the DRLLU algorithm proposed above. The equipment cluster in the simulation experiment mainly comprises 10 user equipment, an edge center node and 5 edge servers, and each mobile equipment can only send one unloading request in unit time. The configuration, computing power and power of all devices in the present invention are set according to SPEC (Standard Performance Evaluation Corporation). Furthermore, the greater the computational power of the device, the greater the power and the higher the energy consumed by the device per unit of time. There is a 50% chance in each slot to randomly generate a DNN inference task from the ue side. For the DNN inference task, the present invention contemplates using GoogLeNet's target detection task, which is deployed on each edge server and user device, the computational effort required for GoogLeNet inference once is 264.6MFLOPS. Other key parameters are shown in table 1. The invention compares the Random algorithm, the Greedy algorithm and the DQN algorithm with the proposed DRLLU algorithm in a simulation manner. The average reasoning time delay, average reasoning energy consumption and average system cost of joint time delay and energy consumption of the deep neural network tasks unloaded by different algorithms are compared.
Table 1 simulation set key parameters
First, the invention is shown in FIG. 4 with the effect of different learning rates on DRLLU training process, learning rates of 0.0010, 0.0015 and 0.0018, respectively. It can be seen that the learning rate is best at 0.0015. When the learning rate is 0.0018, the local optimum solution is more easily obtained, although the convergence rate is faster. When the learning rate is 0.0010, the convergence rate is low although a better solution can be obtained. For this reason, the present invention sets the learning rate to 0.0015 in the subsequent experiments.
Second, the present invention compares the performance of different algorithms in terms of average latency and average energy consumption, weights in FIG. 5Are all set to 0.5. Clearly, random performs the worst, greeny second. Furthermore, while the DQN algorithm and the DRLLU algorithm presented in the present invention perform similarly in terms of reduced latency, the DQN algorithm is far less optimized in terms of energy consumption than the algorithms presented in the present invention. Therefore, the DRLLU algorithm provided by the invention has better performance.
Finally, due to parametersWeights used for balancing time delay and energy consumption in the utility function can meet different service quality requirements. Therefore, in order to verify the robustness of the DRLLU algorithm, the invention compares other algorithms with the algorithm in different/>The following properties. Experimental results indicate that, at arbitrary/>Compared with other algorithms, the DRLLU algorithm provided by the invention achieves the optimal performance, and meanwhile, the robustness of the algorithm is good.
The foregoing is merely a preferred embodiment of the present invention, and is not intended to limit the present invention, and all equivalent variations using the description and drawings of the present invention are within the scope of the present invention.

Claims (5)

1. The deep neural network collaborative reasoning method for the industrial Internet of things is characterized by comprising the following steps of:
Step 1, acquiring data of a user equipment layer and an edge cluster layer, transmitting the acquired data to an edge node center of the edge cluster layer, and preprocessing the acquired data;
Step 2, after generating DNN reasoning tasks, the user equipment layer sends a reasoning request to an edge center node, and the edge center node decides whether the reasoning tasks need to be unloaded to an edge server or not; if the decision is no, turning to the step 3; if yes, turning to step 4, and confirming the number of layers to be unloaded to the appointed edge server;
Wherein the task The offloading decision result of (a) is the task offloading rate/>The following are provided:
Wherein, Representing task/>Front/>, corresponding to the inference modelLayer/>Representing task/>Total number of inference layers;
Local offloading: tasks Reasoning locally at the user equipment;
Global unloading: tasks Unloading to an edge server for reasoning;
Partial unloading: tasks Reasoning/>, on user equipmentThe rest part of the layer is unloaded to an edge server for reasoning;
Step 3, when the edge center node decides to locally unload, the user equipment layer calculates all reasoning tasks, returns the result to the user, and goes to step 5;
step 4, when the edge center node decides to completely unload or partially unload, the uncomputed reasoning task enters a certain queue of the edge center node, after waiting for the corresponding edge server of the current queue to be idle, the reasoning task is transmitted to the edge server for calculation, and the processing result is returned to the user;
step 5, the edge equipment evaluates the performance index processed by the reasoning task, and optimizes the layering unloading strategy according to the performance index evaluation result; the method comprises the following steps:
the performance indexes of reasoning on the server comprise time delay and energy consumption, and the performance indexes are as follows:
1) Tasks At the user equipment/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:
Wherein, Representing task/>In reasoning of the/>Size of data volume at layer,/>For user equipment/>In a single time slotComputing power in,/>For user equipment/>Calculating the power unit required to be consumed;
2) User equipment Maximum uplink transmission rate to edge center node/>Obtained by the shannon formula:
Wherein, Representing user equipment/>In time slot/>Transmission power of/>Representing user equipment/>Radio channel gain to edge center node,/>Representing infinite channel bandwidth,/>Representing gaussian noise power spectral density;
when the inference task is offloaded to the edge-centric node, its propagation delay is delayed And transport energy consumption/>The following are provided:
Wherein the method comprises the steps of Representing task/>Complete reasoning of/>The data size after the layer calculation result;
when the user equipment generates tasks, a request signal is sent to the edge center node, and then each edge server traversed by the edge center node is in the current time slot Status of time/>And the delay required by the calculation of the current task of the edge server, namely waiting delay/>The following are provided:
Wherein the method comprises the steps of Representing edge servers/>Size of remaining unprocessed data,/>Representing edge servers/>In a single time slotComputing power within;
Thereby knowing the user equipment Processing task/>Queuing delay required at the current slot/>The following are provided:
Wherein, Representing task/>Queue sequence number where queuing is located and edge server/>, corresponding to minimum latencyThe serial numbers of the (2) are consistent;
3) After queuing is finished, tasks Will finish processing the/>, through the edge-centric nodeThe layer calculation result is transferred to the AND queue/>In the corresponding edge server, carrying out residual layer reasoning; when task/>At edge server/>When the calculation is performed, the time delay is delayedAnd energy consumption/>The following are provided:
Wherein, Representing edge servers/>Calculating the power unit required to be consumed;
to further reduce latency, in the actual reasoning process, in the task At queue/>While queuing in the middle, task/>DNN reasoning can still be performed at the user equipment layer or tasks can be transmitted from the user equipment to the edge center node; thus the sum of the delays of the three parts/>The following are provided:
Then the task Total inferred time delay/>And total energy consumption/>Expressed as:
according to the above expression, the utility function is expressed as:
Wherein the method comprises the steps of For overhead,/>Is time delay weight,/>Is the energy consumption weight;
the energy consumption and time delay tasks are jointly optimized, and the optimization problem is expressed as:
C1 ensures that the linear sum of the system delay weight and the energy consumption weight is 1, Is the current task/>Maximum tolerable delay of (2); c2 ensures that the total delay must not be greater than the maximum tolerated delay of the task; c3 guaranteed user equipment/>Must not be greater than its maximum available energy consumption; c4 guaranteed edge server/>Must not be greater than its maximum available energy consumption; />, in C5For task/>Including partition points/>, whose corresponding DNN model reasoningIs selected; by optimizing the variables/>AndTo make the system overhead/>Minimum.
2. The deep neural network collaborative reasoning method for the industrial Internet of things according to claim 1, wherein the user equipment layer comprises a plurality of user equipment responsible for continuously collecting industrial data from a production line, and the set of the user equipment is as followsEach user equipment/>Is/>Represented by floating point operand per second FLOPs; the length of one slot is defined as/>Second, in time slot/>Randomly generating DNN reasoning tasks by the user equipment; with one tuple/>To represent each DNN inference task/>Representing a generating task/>Apparatus of/>Representing task/>Data size of,/>Representing task/>Arriving time slots.
3. The method for collaborative reasoning of deep neural network for industrial Internet of things according to claim 2, wherein the edge cluster layer comprises a plurality of edge servers, the edge server closest to the user equipment in geographic position is defined as an edge center node, and the rest of edge servers are collected asEach edge server/>Is/>; The edge center node is responsible for collecting DNN reasoning tasks offloaded from the user equipment layer and reasonably offloads the DNN reasoning tasks to the edge server/>
4. The deep neural network collaborative reasoning method for the industrial Internet of things according to claim 3, wherein an edge center node is provided withIndividual task queues/>Managing DNN inference tasks from a user equipment layer, the task queue/>The three states are idle, occupied, unoccupied and uploaded respectively;
Queues In time slot/>State of time/>The following are provided:
The corresponding edge server has two states, namely an idle state and a busy state; edge server In time slot/>State of time/>The following are provided:
5. the deep neural network collaborative reasoning method for the industrial Internet of things according to claim 1, wherein the optimization problem is solved by adopting an improved DRLLU algorithm, specifically:
Step of the current time Observed state/>And previous time step/>Action/>Composing a state-action pair and integrating the state-action pair with an output value in the LSTM to obtain a real environment state/>Then leading into a deep neural network for training; use/>Performing a function fit, wherein/>Output value representing LSTM layer at current time step,/>Is a parameter; the iterative formula is as follows:
CN202410246171.6A 2024-03-05 2024-03-05 Deep neural network collaborative reasoning method for industrial Internet of things Active CN117834643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410246171.6A CN117834643B (en) 2024-03-05 2024-03-05 Deep neural network collaborative reasoning method for industrial Internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410246171.6A CN117834643B (en) 2024-03-05 2024-03-05 Deep neural network collaborative reasoning method for industrial Internet of things

Publications (2)

Publication Number Publication Date
CN117834643A CN117834643A (en) 2024-04-05
CN117834643B true CN117834643B (en) 2024-05-03

Family

ID=90524260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410246171.6A Active CN117834643B (en) 2024-03-05 2024-03-05 Deep neural network collaborative reasoning method for industrial Internet of things

Country Status (1)

Country Link
CN (1) CN117834643B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1750221A1 (en) * 2005-07-14 2007-02-07 The Boeing Company System, method, and computer program to predict the likelihood, the extent, and the time of an event or change occurrence using a combination of cognitive causal models with reasoning and text processing for knowledge driven decision support
WO2013123445A1 (en) * 2012-02-17 2013-08-22 Interdigital Patent Holdings, Inc. Smart internet of things services
CN111445026A (en) * 2020-03-16 2020-07-24 东南大学 Deep neural network multi-path reasoning acceleration method for edge intelligent application
WO2021012584A1 (en) * 2019-07-25 2021-01-28 北京工业大学 Method for formulating single-task migration strategy in mobile edge computing scenario
CN113950066A (en) * 2021-09-10 2022-01-18 西安电子科技大学 Single server part calculation unloading method, system and equipment under mobile edge environment
CN114356544A (en) * 2021-12-02 2022-04-15 北京邮电大学 Parallel computing method and system facing edge cluster
CN114662661A (en) * 2022-03-22 2022-06-24 东南大学 Method for accelerating multi-outlet DNN reasoning of heterogeneous processor under edge calculation
CN114723057A (en) * 2022-03-31 2022-07-08 北京理工大学 Neural network collaborative reasoning method for multi-access edge computing system
CN114815755A (en) * 2022-05-25 2022-07-29 天津大学 Method for establishing distributed real-time intelligent monitoring system based on intelligent cooperative reasoning
CN114928607A (en) * 2022-03-18 2022-08-19 南京邮电大学 Collaborative task unloading method for multilateral access edge calculation
CN115022319A (en) * 2022-05-31 2022-09-06 浙江理工大学 DRL-based edge video target detection task unloading method and system
CN115034390A (en) * 2022-08-11 2022-09-09 南京邮电大学 Deep learning model reasoning acceleration method based on cloud edge-side cooperation
CN116166444A (en) * 2023-04-26 2023-05-26 南京邮电大学 Collaborative reasoning method oriented to deep learning hierarchical model
CN116541106A (en) * 2023-07-06 2023-08-04 闽南理工学院 Computing task unloading method, computing device and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11727255B2 (en) * 2019-10-15 2023-08-15 Rutgers, The State University Of New Jersey Systems and methods for edge assisted real-time object detection for mobile augmented reality
US20220292819A1 (en) * 2021-03-10 2022-09-15 Rutgers, The State University Of New Jersey Computer Vision Systems and Methods for Acceleration of High-Resolution Mobile Deep Vision With Content-Aware Parallel Offloading

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1750221A1 (en) * 2005-07-14 2007-02-07 The Boeing Company System, method, and computer program to predict the likelihood, the extent, and the time of an event or change occurrence using a combination of cognitive causal models with reasoning and text processing for knowledge driven decision support
WO2013123445A1 (en) * 2012-02-17 2013-08-22 Interdigital Patent Holdings, Inc. Smart internet of things services
WO2021012584A1 (en) * 2019-07-25 2021-01-28 北京工业大学 Method for formulating single-task migration strategy in mobile edge computing scenario
CN111445026A (en) * 2020-03-16 2020-07-24 东南大学 Deep neural network multi-path reasoning acceleration method for edge intelligent application
CN113950066A (en) * 2021-09-10 2022-01-18 西安电子科技大学 Single server part calculation unloading method, system and equipment under mobile edge environment
CN114356544A (en) * 2021-12-02 2022-04-15 北京邮电大学 Parallel computing method and system facing edge cluster
CN114928607A (en) * 2022-03-18 2022-08-19 南京邮电大学 Collaborative task unloading method for multilateral access edge calculation
CN114662661A (en) * 2022-03-22 2022-06-24 东南大学 Method for accelerating multi-outlet DNN reasoning of heterogeneous processor under edge calculation
CN114723057A (en) * 2022-03-31 2022-07-08 北京理工大学 Neural network collaborative reasoning method for multi-access edge computing system
CN114815755A (en) * 2022-05-25 2022-07-29 天津大学 Method for establishing distributed real-time intelligent monitoring system based on intelligent cooperative reasoning
CN115022319A (en) * 2022-05-31 2022-09-06 浙江理工大学 DRL-based edge video target detection task unloading method and system
CN115034390A (en) * 2022-08-11 2022-09-09 南京邮电大学 Deep learning model reasoning acceleration method based on cloud edge-side cooperation
WO2024032121A1 (en) * 2022-08-11 2024-02-15 南京邮电大学 Deep learning model reasoning acceleration method based on cloud-edge-end collaboration
CN116166444A (en) * 2023-04-26 2023-05-26 南京邮电大学 Collaborative reasoning method oriented to deep learning hierarchical model
CN116541106A (en) * 2023-07-06 2023-08-04 闽南理工学院 Computing task unloading method, computing device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
移动边缘网络中深度学习任务卸载方案;尹高;石远明;;重庆邮电大学学报(自然科学版);20200215(第01期);全文 *

Also Published As

Publication number Publication date
CN117834643A (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN113242568B (en) Task unloading and resource allocation method in uncertain network environment
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN108920280B (en) Mobile edge computing task unloading method under single-user scene
CN113950103B (en) Multi-server complete computing unloading method and system under mobile edge environment
CN112860350B (en) Task cache-based computation unloading method in edge computation
CN110377353B (en) System and method for unloading computing tasks
CN110971706B (en) Approximate optimization and reinforcement learning-based task unloading method in MEC
CN112882815B (en) Multi-user edge calculation optimization scheduling method based on deep reinforcement learning
CN113543176B (en) Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN111930436A (en) Random task queuing and unloading optimization method based on edge calculation
CN112422644B (en) Method and system for unloading computing tasks, electronic device and storage medium
CN113626104B (en) Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN114340016A (en) Power grid edge calculation unloading distribution method and system
CN111970154B (en) Unloading decision and resource allocation method based on deep reinforcement learning and convex optimization
CN115190033B (en) Cloud edge fusion network task unloading method based on reinforcement learning
CN116260871A (en) Independent task unloading method based on local and edge collaborative caching
CN113760511A (en) Vehicle edge calculation task unloading method based on depth certainty strategy
CN116233927A (en) Load-aware computing unloading energy-saving optimization method in mobile edge computing
CN117436485A (en) Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision
CN113946423A (en) Multi-task edge computing scheduling optimization method based on graph attention network
CN110768827B (en) Task unloading method based on group intelligent algorithm
CN117834643B (en) Deep neural network collaborative reasoning method for industrial Internet of things
CN112600869A (en) Calculation unloading distribution method and device based on TD3 algorithm
CN116663644A (en) Multi-compression version Yun Bianduan DNN collaborative reasoning acceleration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant