CN117834643B

CN117834643B - Deep neural network collaborative reasoning method for industrial Internet of things

Info

Publication number: CN117834643B
Application number: CN202410246171.6A
Authority: CN
Inventors: 郭永安; 奚城科; 王宇翱; 周金粮; 钱琪杰
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2024-03-05
Filing date: 2024-03-05
Publication date: 2024-05-03
Anticipated expiration: 2044-03-05
Also published as: CN117834643A

Abstract

The invention belongs to the technical field of industrial Internet of things and deep neural networks, and discloses a deep neural network collaborative reasoning method for the industrial Internet of things, which uses a DQN algorithm and an LSTM algorithm to carry out efficient dynamic unloading on reasoning tasks of the industrial Internet of things, wherein the LSTM algorithm has memory capacity on long-term states and can more accurately estimate the current states of a current server and user equipment, thereby ensuring the accuracy of the algorithm and improving the reasoning efficiency. The method can more effectively utilize the computing capacity of the edge server and improve the efficiency of the deep neural network reasoning; meanwhile, the delay of the reasoning task can be reduced, the energy consumption can be reduced, and the optimal reasoning method can be dynamically selected according to the change of the environment.

Description

Deep neural network collaborative reasoning method for industrial Internet of things

Technical Field

The invention belongs to the field of industrial Internet of things, and particularly relates to a deep neural network collaborative reasoning method for the industrial Internet of things.

Background

With the rapid development of the internet of things in recent years, the number of internet of things devices has increased dramatically, and various novel applications, such as augmented reality and image processing, have also increased. Deep Neural Network (DNN) reasoning is a computationally intensive task, with higher requirements on DNN reasoning in terms of latency in industrial internet of things scenarios, whereas internet of things devices with limited computational resources have difficulty in meeting efficient reasoning of DNN tasks. While cloud computing can alleviate this problem to some extent, there are still limitations in terms of latency and power consumption. Edge computing is an effective way to break this bottleneck, offloading DNN inference tasks to edge servers for efficient execution. Network delay and transmission costs can be reduced if appropriate offloading methods are employed in edge computation. Therefore, how to effectively realize the offloading of the task of deep neural network reasoning in the industrial internet of things becomes the key point of current research.

In the prior art, some methods for unloading DNN reasoning tasks exist; for example, the reasoning task can be offloaded to cloud reasoning by selecting the optimal partition point; an early exit mechanism can also be adopted to meet the low-delay calculation requirement of the user; however, the following problems still exist in the prior art:

1) The existing DNN reasoning method often ignores the waiting time delay of tasks in each time slot when the equipment is busy, and has larger deviation from the actual reasoning time delay;

2) In an industrial Internet of things scene, the task processing efficiency is limited by the limited battery capacity, and the energy consumption problem is not considered in the existing method;

3) The existing method only focuses on the condition that one server corresponds to one or more devices, ignores the complex condition that a plurality of servers correspond to a plurality of devices in practice, and increases time delay and energy consumption while adding load because each server needs to make a decision.

Disclosure of Invention

In order to solve the technical problems, the invention provides a deep neural network collaborative reasoning method for an industrial Internet of things, which is characterized in that a framework for decision taking by an edge center node is added in a multi-equipment multi-server scene, a problem model for combined optimization of time delay and energy consumption is constructed, and the reasoning time delay and the energy consumption are reduced.

The invention discloses a deep neural network collaborative reasoning method for industrial Internet of things, which comprises the following steps:

Step 1, acquiring data of a user equipment layer and an edge cluster layer, transmitting the acquired data to an edge node center of the edge cluster layer, and preprocessing the acquired data;

Step 2, after the user equipment layer generates the reasoning task, sending a reasoning request to an edge center node, and deciding whether the reasoning task needs to be unloaded to an edge server or not by the edge center node; if the decision is no, turning to the step 3; if yes, turning to step 4, and confirming the number of layers to be unloaded to the appointed edge server;

Step 3, when the edge center node decides to locally unload, the user equipment layer calculates all reasoning tasks, returns the result to the user, and goes to step 5;

step 4, when the edge center node decides to completely unload or partially unload, the uncomputed reasoning task enters a certain queue of the edge center node, after waiting for the corresponding edge server of the current queue to be idle, the reasoning task is transmitted to the edge server for calculation, and the processing result is returned to the user;

and 5, evaluating the performance index processed by the reasoning task by the edge equipment, and optimizing the layering unloading strategy according to the performance index evaluation result.

Further, the user equipment layer comprises a plurality of user equipment which are responsible for continuously collecting industrial data from the production line, wherein the set of the user equipment is thatEach user equipment/>Is/>Represented by floating point operand per second FLOPs; the length of one slot is defined as/>Second, in time slot/>Randomly generating DNN reasoning tasks by the user equipment; with one tuple/>To represent each DNN inference task，/>Representing a generating task/>Apparatus of/>Representing task/>Data size of,/>Representing task/>Arriving time slots.

Further, the edge cluster layer includes a plurality of edge servers, defines the edge server closest to the user equipment in the geographic position as an edge center node, and sets the rest of edge servers asEach edge server/>Is/>; The edge center node is responsible for collecting DNN reasoning tasks offloaded from the user equipment layer and reasonably offloads the DNN reasoning tasks to the edge server/>。

Further, the task queueThe three states are idle, occupied, unoccupied and uploaded respectively;

Queues In time slot/>State of time/>The following are provided:

，

The corresponding edge server has two states, namely an idle state and a busy state; edge server In time slot/>State of time/>The following are provided:

。

Further, in step 2, the task The offloading decision result of (a) is the task offloading rate/>The following are provided:

，

Wherein, Representing task/>Front/>, corresponding to the inference modelLayer/>Representing task/>Total number of inference layers;

Local offloading: tasks Reasoning locally at the user equipment;

Global unloading: tasks Unloading to an edge server for reasoning;

Partial unloading: tasks Reasoning/>, on user equipmentAnd unloading the rest part to an edge server for reasoning.

Further, the performance indexes of reasoning on the server include time delay and energy consumption, and the performance indexes are as follows:

1) Tasks At the user equipment/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:

，

Wherein, Representing task/>In reasoning of the/>Size of data volume at layer,/>For user equipment/>In a single time slot/>Computing power in,/>For user equipment/>Calculating the power unit required to be consumed;

2) User equipment Maximum uplink transmission rate to edge center node/>Obtained by the shannon formula:

，

Wherein, Representing user equipment/>In time slot/>Transmission power of/>Representing user equipment/>Radio channel gain to edge center node,/>Representing infinite channel bandwidth,/>Representing gaussian noise power spectral density;

when the inference task is offloaded to the edge-centric node, its propagation delay is delayed And transport energy consumption/>The following are provided:

，

Wherein the method comprises the steps of Representing task/>Complete reasoning of/>The data size after the layer calculation result;

when the user equipment generates tasks, a request signal is sent to the edge center node, and then each edge server traversed by the edge center node is in the current time slot Status of time/>And the delay required by the calculation of the current task of the edge server, namely waiting delay/>The following are provided:

，

Wherein the method comprises the steps of Representing edge servers/>Size of remaining unprocessed data,/>Representing edge servers/>In a single time slot/>Computing power within;

Thereby knowing the user equipment Processing task/>Queuing delay required at the current slot/>The following are provided:

，

Wherein, Representing task/>Queue sequence number where queuing is located and edge server/>, corresponding to minimum latencyThe serial numbers of the (2) are consistent;

3) After queuing is finished, tasks Will finish processing the/>, through the edge-centric nodeThe layer calculation result is transferred to the AND queue/>In the corresponding edge server, carrying out residual layer reasoning; when task/>At edge server/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:

，

Wherein, Representing edge servers/>Calculating the power unit required to be consumed;

to further reduce latency, in the actual reasoning process, in the task At queue/>While queuing in the middle, task/>DNN reasoning can still be performed at the user equipment layer or tasks can be transmitted from the user equipment to the edge center node; thus the sum of the delays of the three parts/>The following are provided:

，

Then the task Total inferred time delay/>And total energy consumption/>Expressed as:

，

according to the above expression, the utility function is expressed as:

，

Wherein the method comprises the steps of For overhead,/>Is time delay weight,/>Is the energy consumption weight.

Further, the energy consumption and time-consuming tasks are jointly optimized, and the optimization problem is expressed as follows:

，

C1 ensures that the linear sum of the system delay weight and the energy consumption weight is 1, Is the current task/>Maximum tolerable delay of (2); c2 ensures that the total delay must not be greater than the maximum tolerated delay of the task; c3 guaranteed user equipment/>Must not be greater than its maximum available energy consumption; c4 guaranteed edge server/>Must not be greater than its maximum available energy consumption; />, in C5For task/>Including partition points/>, whose corresponding DNN model reasoningIs selected; by optimizing variablesAnd/>To make the system overhead/>Minimum.

Further, the optimization problem is solved by adopting an improved DRLLU algorithm, specifically:

Step of the current time Observed state/>And previous time step/>Action/>Composing a state-action pair and integrating the state-action pair with an output value in the LSTM to obtain a real environment state/>Then leading into a deep neural network for training; use/>Performing a function fit, wherein/>Output value representing LSTM layer at current time step,/>Is a parameter; the iterative formula is as follows:

。

The beneficial effects of the invention are as follows: the method can unload DNN reasoning tasks to the edge server, reduce task reasoning and waiting delay, and simultaneously reduce energy consumption of user equipment and a cloud data center; the DNN reasoning task unloading strategy and the resource dynamic allocation method based on the DQN algorithm and the LSTM algorithm are used, the last full-connection layer of the DQN network is replaced by the LSTM layer to form an improved DRLLU algorithm, and the reasoning task can be intelligently unloaded to an edge server for calculation, so that the resources of edge calculation in the industrial Internet of things are utilized more efficiently, and the resource utilization rate is improved; the method can intelligently learn the optimal reasoning task unloading strategy according to the real-time environment and the reasoning task demands, and can dynamically adjust according to the environment, so that unloading decisions are further optimized.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a system model diagram of the present invention;

FIG. 3 is a diagram of an algorithm architecture of the present invention;

FIG. 4 is a graph showing average rewards obtained by the algorithm according to the invention under different learning rates as a function of step size;

FIG. 5 is a graph of average reasoning delay and average reasoning energy consumption comparison when DNN reasoning is performed by different algorithms;

fig. 6 is a graph of average overhead variation under weight for the algorithm provided by the present invention.

Detailed Description

In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings.

As shown in fig. 1, the method for collaborative reasoning of the deep neural network for the industrial internet of things provided by the invention comprises the following steps:

As shown in fig. 2, the present invention contemplates an industrial internet of things system comprising multiple devices and multiple servers, the system comprising a user device layer and an edge cluster layer.

1) User equipment layer: the user equipment layer consists of a large number of devices associated with the industrial production line responsible for collecting industrial data. Such as an industrial camera, an industrial scanner, a pressure sensor, etc. deployed locally. Is responsible for continuously collecting industrial data from the production line. The invention uses the collectionTo represent these devices. Each user equipmentIs/>Expressed in floating point Operands Per Second (FLOPs). The invention sets the length of one time slot in the discrete time model as/>Second. In time slot/>The user equipment randomly generates DNN inference tasks. The invention uses a tuple/>To represent each DNN inference task/>；/>Representing a generating task/>Apparatus of/>Representing task/>Data size of,/>Representing task/>Arriving time slots; each DNN task can be locally inferred, offloaded to the edge server inference, and uploaded to the upper server inference remaining layers in the first few layers of local inferences. Due to the layering nature of DNN inference models, for task/>The corresponding DNN model layer number is set/>, and the method uses the set/>Expressed by/>Representing task/>Reasoning of the/>The data size of the layer. In addition, in DNN reasoning process, the calculation result of each layer is further inferred as the input of the next layer,/>Representing task/>Complete reasoning of/>Data size of layer calculation result, wherein/>Time,/>. The invention deploys DNN models for processing corresponding reasoning tasks for each user equipment.

2) Edge cluster layer: an edge cluster layer contains several edge servers, all deployed near the industrial line. The invention refers to the edge server closest to the user equipment in geographic position as an edge center node, and the rest edge servers are responsible for DNN reasoning, so that the edge cluster layer consists of one edge center node and a plurality of edge servers. For the inventionTo represent these edge servers. Each edge server/>Is/>Each edge server is deployed with a DNN model for processing. The edge center node is responsible for collecting DNN reasoning tasks offloaded from the user equipment layer and reasonably offloads the DNN reasoning tasks to the edge server/>. Due to presence/>The edge servers, and thus the edge center nodes, are equipped with the same number of task queues to manage DNN inference tasks from the user equipment layer, the present invention uses/>Representing these queues. Queue/>The three states are idle, occupied, unoccupied and uploaded, respectively. Queue/>In time slot/>State of time/>The following are provided:

；

there are also two states, an idle state and a busy state, for the edge servers to which they correspond. Edge server In time slot/>State of time/>The following are provided:

。

As shown in fig. 3, the objective of the present invention is to minimize the overall delay of DNN reasoning over a long period of time, while minimizing the overall energy consumption of the user equipment and edge servers. From the above preliminary analysis, an expression of the utility function can be deduced.

The DNN reasoning time delay consists of a user equipment side processing time delay, an edge server side processing time delay, a user equipment to edge center node transmission time delay, an edge center node to edge server transmission time delay, a transmission time delay for the edge server to transmit the processing result back to the edge center node, a transmission time delay for the edge center node to further transmit the result back to the user equipment, and a queue waiting time delay in the edge center node.

Considering that the network bandwidth between the edge center node and the edge server is large, the transmission rate is high, so the transmission delay between the edge center node and the edge server is negligible. Furthermore, DNN infers that the resulting data size from the last layer is very small, so the transmission delay during the return of the result to the user equipment through the edge-centric node is also negligible.

Therefore, in terms of time delay, the invention only needs to pay attention to 4 parts of processing time delay of the user equipment end, processing time delay of the edge server end, transmission time delay from the user equipment to the edge center node and queue waiting time delay in the edge center node. Likewise, in terms of energy consumption, the present invention also only needs to consider the following 3 aspects: the energy consumption of the user equipment, the energy consumption of the edge server and the energy consumption of the transmission from the user equipment to the edge center node.

TasksThe offloading decision result of (a) is the task offloading rate/>The following are provided:

，

Local offloading: tasks Reasoning locally at the user equipment;

Global unloading: tasks Unloading to an edge server for reasoning;

TasksAt the user equipment/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:

，

Wherein, Representing task/>In reasoning of the/>Size of data volume at layer,/>For user equipment/>In a single time slot/>Computing power in,/>For user equipment/>The power units to be consumed are calculated.

The user equipment adopts Orthogonal Frequency Division Multiple Access (OFDMA) to transmit tasks to the edge center node, and different user equipment occupies different channels for transmission. User equipmentMaximum uplink transmission rate to edge center node/>From shannon's formula:

，

Wherein, Representing user equipment/>In time slot/>Transmission power of/>Representing user equipment/>Radio channel gain to edge center node,/>Representing infinite channel bandwidth,/>Representing gaussian noise power spectral density.

When the inference task is offloaded to the edge-centric node, its propagation delay is delayedAnd transport energy consumption/>The following are provided:

，

Wherein the method comprises the steps of Representing task/>Complete reasoning of/>The amount of data after the layer calculation results.

When the user equipment generates tasks, a request signal is sent to the edge center node, and then each edge server traversed by the edge center node is in the current time slotStatus of time/>And the delay required by the calculation of the current task of the edge server, namely waiting delay/>The following are provided:

，

Wherein the method comprises the steps of Representing edge servers/>Size of remaining unprocessed data,/>Representing edge servers/>In a single time slot/>Computing power within.

Thereby knowing the user equipmentProcessing task/>Queuing delay required at the current slot/>The following are provided:

，

Wherein, Representing task/>Queue sequence number where queuing is located and edge server/>, corresponding to minimum latencyIs identical in sequence number.

After queuing is finished, tasksWill finish processing the/>, through the edge-centric nodeThe layer calculation result is transferred to the AND queueIn the corresponding edge server, carrying out residual layer reasoning; when task/>At edge server/>When the calculation is performed, the time delay/>And energy consumption/>The following are provided:

，

Wherein, Representing edge servers/>The power units to be consumed are calculated.

To further reduce latency, in the actual reasoning process, in the taskAt queue/>While queuing in the middle, task/>DNN reasoning can still be performed at the user equipment layer or tasks can be transmitted from the user equipment to the edge center node; thus the sum of the delays of the three parts/>The following are provided:

，

according to the above expression, the utility function is expressed as:

，

The invention aims to minimize the utility function proposed by the formula, so as to jointly optimize the energy consumption and the task delay of the system; the key to the problem is obviously the choice of the best partition point and edge server for the DNN inference task. The optimization problem of the present invention can be expressed as:

，

C1 ensures that the linear sum of the system delay weight and the energy consumption weight is 1, Is the current task/>Maximum tolerable delay of (2); c2 ensures that the total delay must not be greater than the maximum tolerated delay of the task; c3 guaranteed user equipment/>Must not be greater than its maximum available energy consumption; c4 guaranteed edge server/>Must not be greater than its maximum available energy consumption; />, in C5For task/>Including partition points/>, whose corresponding DNN model reasoningIs selected; the invention is realized by optimizing the variable/>And/>To make the system overhead/>Minimum.

The optimization problem is a discrete decision problem, and thus the invention converts it into a markov offload decision (MDP) problem, which can be solved by the DQN algorithm. Fig. 3 is combined with an actual environment of industrial internet of things side-end coordination to construct an MDP model, and an DQN algorithm is improved, so that the problem of reasoning task unloading in an edge center node is solved.

MDP model:

1) State space

The agent needs to gather information about DNN inference tasks, user device queuing queues, and edge server status to optimize its scheduling in the edge computing environment. This information is contained within the state space, necessary to achieve optimization goals and constraints. The invention steps the timeThe state space at is defined as/>Wherein/>Representing the data volume of the task of the current layer of reasoning; /(I)Indicating the result data quantity generated by the current layer after reasoning; /(I)Representing task/>Partitioning points corresponding to DNN reasoning models; /(I)Represents the/>Individual user equipment at current step/>The uplink transmission rate at that time; /(I)Represents the/>Individual user equipment at time step/>Power at that time; /(I)Represents the/>The individual edge servers are at time step/>Power at that time; The representation represents the/> The invention sets the initial value of the state to 0 for the delay that the queues need to wait.

2) Action space

In the algorithm model of the invention, the actions of the DQN include partition point selection and edge server selection. At each time slotIn which an action may be represented as a vector/>Representing task/>(1 /)Inference after layer offloads to the/>And the edge servers. In order to limit the action space, the invention limits the unloading layer number of the edge server side to/>Within the inner part.

3) Reward function

The reward function is set to a negative value of the utility function. The agent will seek/>And rewards after each action taken; however, the too small number of layers inferred at the ue end can lead to the time-consuming process of queuing, and conversely, the too large number of layers inferred at the ue end can far exceed the queuing delay, so that the energy consumption at the ue end is too large, and the total inference delay can be prolonged. To solve this bad result, the present invention adjusts the reward function by considering the difference between the sum of the user equipment side reasoning delay and transmission delay and queuing delay. To this end, the invention sets a penalty function/>The method comprises the following steps:

，

According to penalty function The present invention can adjust the bonus function R to:

，

optimization based on LSTM algorithm:

DQN is a value iteration based deep reinforcement learning algorithm with the goal of estimating the optimal strategy Values. The algorithm calculates an approximate function by using a deep neural network, and converts the Q-Table updating problem into a function fitting problem, so that similar output actions can be obtained according to the current state, and the defect of the traditional Q-Learning algorithm in high-dimensional continuous problems is overcome. By updating the parameters/>, as shown in the following formulaFunction/>Is approximated as/>Value:

，

in the method, in the process of the invention, Expressed in time step/>Take action/>The next state after,/>Representing taking action/>Instant rewards after,/>State/>All actions that can be taken are taken; /(I)Is a discount coefficient in the value accumulation process; /(I)Is the learning rate.

The DQN not only improves the searching speed of the Q-Learning algorithm through function fitting, but also improves the diversity and the stability of the Q-Learning algorithm through adding an experience pool and a target network. The experience pool interacts the agent with the environment to obtain migration samples at each time stepAnd storing the data in an experience pool buffer. During training, a certain number of samples are randomly selected, so that the problems of data correlation and non-static distribution are solved, and the target network value/>Refers to the goal/>, of a training process generated using a goal networkValues. The structure of the target network is consistent with the neural network main network of the DQN, and the parameters of the main network are shown as/>And copying the iteration of the round into a target network. Thus, the current/>Values and targets/>The value is kept for a period of time, and a loss function is calculatedThen, the main network parameters are inversely updated by utilizing a random gradient descent method (SGD);

，

Wherein the method comprises the steps of ，/>The output of the primary network and the output of the target network, respectively.

In a real industrial internet of things environment, the problem is more complex, resulting in limited system perception. Assuming that the state information of the system is partially known, then the real environment state is not well reflected when the observations are. At this point, it is difficult for the DQN to directly solve the actual reasoning offloading problem. Considering the gradual change of the states of the edge server and the user equipment along with the time and the memory capacity of the LSTM network for the long-term state, the invention integrates the long-term history data by utilizing a circulating structure by replacing the last full-connection layer of the DQN network with the LSTM layer, thereby more accurately estimating the current state. As shown in FIG. 3, the DRLLU algorithm modified from the DQN algorithm is run from the current time stepObserved state/>And previous time step/>Action/>Composing a state-action pair and integrating the state-action pair with an output value in the LSTM to obtain a real environment state/>Then the deep neural network is imported for training. Thus, relative to the/>, used by the DQN algorithmDRLLU is more prone to use/>Performing a function fit, wherein/>Representing the output value of the LSTM layer at the current time step. The iterative formula is as follows:

。

As shown in fig. 4-6, the present invention utilizes tensorflow to simulate the problem of unloading DNN model reasoning in industrial internet of things, and verifies the DRLLU algorithm proposed above. The equipment cluster in the simulation experiment mainly comprises 10 user equipment, an edge center node and 5 edge servers, and each mobile equipment can only send one unloading request in unit time. The configuration, computing power and power of all devices in the present invention are set according to SPEC (Standard Performance Evaluation Corporation). Furthermore, the greater the computational power of the device, the greater the power and the higher the energy consumed by the device per unit of time. There is a 50% chance in each slot to randomly generate a DNN inference task from the ue side. For the DNN inference task, the present invention contemplates using GoogLeNet's target detection task, which is deployed on each edge server and user device, the computational effort required for GoogLeNet inference once is 264.6MFLOPS. Other key parameters are shown in table 1. The invention compares the Random algorithm, the Greedy algorithm and the DQN algorithm with the proposed DRLLU algorithm in a simulation manner. The average reasoning time delay, average reasoning energy consumption and average system cost of joint time delay and energy consumption of the deep neural network tasks unloaded by different algorithms are compared.

Table 1 simulation set key parameters

。

First, the invention is shown in FIG. 4 with the effect of different learning rates on DRLLU training process, learning rates of 0.0010, 0.0015 and 0.0018, respectively. It can be seen that the learning rate is best at 0.0015. When the learning rate is 0.0018, the local optimum solution is more easily obtained, although the convergence rate is faster. When the learning rate is 0.0010, the convergence rate is low although a better solution can be obtained. For this reason, the present invention sets the learning rate to 0.0015 in the subsequent experiments.

Second, the present invention compares the performance of different algorithms in terms of average latency and average energy consumption, weights in FIG. 5Are all set to 0.5. Clearly, random performs the worst, greeny second. Furthermore, while the DQN algorithm and the DRLLU algorithm presented in the present invention perform similarly in terms of reduced latency, the DQN algorithm is far less optimized in terms of energy consumption than the algorithms presented in the present invention. Therefore, the DRLLU algorithm provided by the invention has better performance.

Finally, due to parametersWeights used for balancing time delay and energy consumption in the utility function can meet different service quality requirements. Therefore, in order to verify the robustness of the DRLLU algorithm, the invention compares other algorithms with the algorithm in different/>The following properties. Experimental results indicate that, at arbitrary/>Compared with other algorithms, the DRLLU algorithm provided by the invention achieves the optimal performance, and meanwhile, the robustness of the algorithm is good.

The foregoing is merely a preferred embodiment of the present invention, and is not intended to limit the present invention, and all equivalent variations using the description and drawings of the present invention are within the scope of the present invention.

Claims

1. The deep neural network collaborative reasoning method for the industrial Internet of things is characterized by comprising the following steps of:

Step 2, after generating DNN reasoning tasks, the user equipment layer sends a reasoning request to an edge center node, and the edge center node decides whether the reasoning tasks need to be unloaded to an edge server or not; if the decision is no, turning to the step 3; if yes, turning to step 4, and confirming the number of layers to be unloaded to the appointed edge server;

Wherein the task The offloading decision result of (a) is the task offloading rate/>The following are provided:

，

Local offloading: tasks Reasoning locally at the user equipment;

Global unloading: tasks Unloading to an edge server for reasoning;

Partial unloading: tasks Reasoning/>, on user equipmentThe rest part of the layer is unloaded to an edge server for reasoning;

step 5, the edge equipment evaluates the performance index processed by the reasoning task, and optimizes the layering unloading strategy according to the performance index evaluation result; the method comprises the following steps:

the performance indexes of reasoning on the server comprise time delay and energy consumption, and the performance indexes are as follows:

，

Wherein, Representing task/>In reasoning of the/>Size of data volume at layer,/>For user equipment/>In a single time slotComputing power in,/>For user equipment/>Calculating the power unit required to be consumed;

，

Wherein the method comprises the steps of Representing edge servers/>Size of remaining unprocessed data,/>Representing edge servers/>In a single time slotComputing power within;

，

3) After queuing is finished, tasks Will finish processing the/>, through the edge-centric nodeThe layer calculation result is transferred to the AND queue/>In the corresponding edge server, carrying out residual layer reasoning; when task/>At edge server/>When the calculation is performed, the time delay is delayedAnd energy consumption/>The following are provided:

，

according to the above expression, the utility function is expressed as:

，

Wherein the method comprises the steps of For overhead,/>Is time delay weight,/>Is the energy consumption weight;

the energy consumption and time delay tasks are jointly optimized, and the optimization problem is expressed as:

，

C1 ensures that the linear sum of the system delay weight and the energy consumption weight is 1, Is the current task/>Maximum tolerable delay of (2); c2 ensures that the total delay must not be greater than the maximum tolerated delay of the task; c3 guaranteed user equipment/>Must not be greater than its maximum available energy consumption; c4 guaranteed edge server/>Must not be greater than its maximum available energy consumption; />, in C5For task/>Including partition points/>, whose corresponding DNN model reasoningIs selected; by optimizing the variables/>AndTo make the system overhead/>Minimum.

2. The deep neural network collaborative reasoning method for the industrial Internet of things according to claim 1, wherein the user equipment layer comprises a plurality of user equipment responsible for continuously collecting industrial data from a production line, and the set of the user equipment is as followsEach user equipment/>Is/>Represented by floating point operand per second FLOPs; the length of one slot is defined as/>Second, in time slot/>Randomly generating DNN reasoning tasks by the user equipment; with one tuple/>To represent each DNN inference task/>，Representing a generating task/>Apparatus of/>Representing task/>Data size of,/>Representing task/>Arriving time slots.

3. The method for collaborative reasoning of deep neural network for industrial Internet of things according to claim 2, wherein the edge cluster layer comprises a plurality of edge servers, the edge server closest to the user equipment in geographic position is defined as an edge center node, and the rest of edge servers are collected asEach edge server/>Is/>; The edge center node is responsible for collecting DNN reasoning tasks offloaded from the user equipment layer and reasonably offloads the DNN reasoning tasks to the edge server/>。

4. The deep neural network collaborative reasoning method for the industrial Internet of things according to claim 3, wherein an edge center node is provided withIndividual task queues/>Managing DNN inference tasks from a user equipment layer, the task queue/>The three states are idle, occupied, unoccupied and uploaded respectively;

Queues In time slot/>State of time/>The following are provided:

，

。

5. the deep neural network collaborative reasoning method for the industrial Internet of things according to claim 1, wherein the optimization problem is solved by adopting an improved DRLLU algorithm, specifically:

。