CN114528987A

CN114528987A - Neural network edge-cloud collaborative computing segmentation deployment method

Info

Publication number: CN114528987A
Application number: CN202210137345.6A
Authority: CN
Inventors: 周明拓; 任天锋; 郁春波; 贺文; 李剑
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2022-02-15
Filing date: 2022-02-15
Publication date: 2022-05-24

Abstract

The invention provides a neural network edge-cloud collaborative computing segmentation deployment method, which comprises the following steps: establishing time delay prediction models, wherein each time delay prediction model is used for predicting the calculation time delay of a basic neural network layer under a calculation resource; determining an optimal segmentation point of a neural network to be deployed, namely a segmentation point corresponding to the minimum overall time delay; and dividing the trained neural network to be deployed into a first sub neural network and a second sub neural network by using the optimal dividing point, and respectively deploying the first sub neural network and the second sub neural network on the equipment at the edge end and the cloud server to perform collaborative computing. According to the method, the neural network is divided into two sub-neural networks which are respectively deployed on the edge equipment and the cloud equipment, so that the calculation sharing can be reduced compared with pure edge calculation, the calculation efficiency is improved, and the network transmission burden can be reduced compared with cloud calculation, so that the application delay of the neural network is reduced, and the response speed is improved; at the same time, the model is not compressed and therefore accuracy is not lost.

Description

Neural network edge-cloud collaborative computing segmentation deployment method

Technical Field

The invention belongs to the field of artificial intelligence and edge computing, and particularly relates to a neural network edge-cloud collaborative computing segmentation deployment method which can be used for neural network application deployment under the requirements of high load and low time delay.

Background

With the rise of the field of artificial intelligence and the popularization of neural network application, more and more devices can support the neural network application. Various types of applications, such as image classification, voice recognition, and natural language processing, are increasingly popular among various types of terminal devices of users. In the face of requests of a large number of users and increasingly heavy network transmission load, how to enhance the response speed of neural network application and improve the request processing capability is a technical difficulty to be solved currently.

The neural network technology is an effective method for realizing artificial intelligence application. Current artificial intelligence applications often build models using a series of neural network layers, and thus such models are also referred to as deep neural networks, each layer of which is composed of neurons capable of producing nonlinear outputs from input-output data of the neurons. Neurons in the input layer receive data and propagate it to the intermediate layer (also referred to as the hidden layer). The neurons of the middle layer then generate a weighted sum of the input data and output the weighted sum using a particular activation function, and then propagate the output to the output layer. The final result is displayed on the output layer. The deep neural network has more complex and abstract layers than a general machine learning model, and can learn high-level characteristics, so that high-precision task reasoning is realized. The deep neural network has three common structures, namely a full connection layer (FNN), a convolutional neural network layer (CNN) and a recurrent neural network layer (RNN).

Edge computing is an emerging area of research that is dedicated to pushing cloud services from the core of the network to network edge devices closer to the requesting party, such as network base stations, routers, etc. The core of the method is to provide computing service for a requesting party and simultaneously to be close to a requesting terminal as much as possible. Edge computing can bring lower transmission delay and bandwidth consumption compared with cloud computing, but at the same time, the computing resources of edge computing are limited. In order to better utilize computing resources in the network, the response speed of the neural network is improved. Researchers have begun introducing edge calculations in neural network computing processes.

In the existing neural network application scenario, the neural network application generally has a large computational power requirement, and a neural network model is often deployed on a cloud server. And the user sends the request data to the cloud terminal for inference at the edge end through the network, and then returns the inference result to the user. This approach is very dependent on the requestor and cloud network conditions. If the data volume of the requester is too large or the network bandwidth is limited, the response speed of the request is affected by the communication delay, and finally the transmission delay of the data is greatly increased, and the response time of the request is too long, so that the response speed of the neural network application is reduced. Therefore, in the case of limited network bandwidth, the delay overhead of network transmission occupies a higher proportion than the delay generated by cloud computing.

In order to reduce the overhead of network transmission delay, research attempts are currently made to directly deploy a neural network on an edge terminal, that is, a network base station, an edge router, and other devices. However, these edge devices have limited computing power, and if the computing resource requirement of the neural network model is too large or complex, the edge devices cause higher computing delay. In order to solve the problem, the prior art is dedicated to compressing the model, and the computational resource requirement of the model is reduced by means of early termination, pruning, calculation precision reduction and the like, but the technical means often causes the reduction of the prediction precision of the model.

Disclosure of Invention

The invention aims to provide a neural network edge-cloud collaborative computing segmentation deployment method, which is used for reducing the application delay of a neural network, improving the response speed and simultaneously ensuring no loss of precision.

In order to achieve the above object, the present invention provides a neural network edge-cloud collaborative computing segmentation deployment method, including:

s1: determining the layer type of a basic neural network layer of a neural network to be deployed, and respectively establishing a time delay prediction model f (x, S) for each computing resource and each basic neural network layer, wherein each time delay prediction model is only used for predicting the computing time delay of one basic neural network layer under one computing resource;

s2: executing a neural network segmentation algorithm to determine an optimal segmentation point of the neural network to be deployed, wherein the optimal segmentation point is a segmentation point K corresponding to the minimum value in all overall time delays T (K) when the segmentation point K is from 1 to l-1, and the segmentation point is used for segmenting the neural network to be deployed into a first sub-neural network and a second sub-neural network; l is the number of neural network layers in the neural network to be deployed;

s3: the trained neural network to be deployed is divided into a first sub-neural network and a second sub-neural network by using the optimal dividing point, the first sub-neural network is deployed on the equipment at the edge end, and the second sub-neural network is deployed on the cloud server for performing collaborative computing.

The input parameters of the time delay prediction model f (x, S) are x and S, x is a feature vector corresponding to a neural network layer, the feature vector has one or more feature variables, S is the type of computing resources, and the type S of the computing resources comprises an edge terminal type S_cloudAnd cloud type S_cloud。

In step S1, a time delay prediction model is built and trained in the form of a regression tree for all kinds of feature variables corresponding to each basic neural network layer.

The step S1 includes: the following steps are executed for each computing resource and each basic neural network layer:

s10: acquiring all characteristic variables of a basic neural network layer and a sample of the calculation time delay of the neural network layer under a calculation resource to form a sample data set D of a time delay prediction model;

s11: for one of the characteristic variables xⁱSetting a current segmentation point s to enable a sample data set D of the time delay prediction model to be in the characteristic variable xⁱIs divided into two sub-sample spaces in dimensionInter, i.e., the first sub-sample space and the second sub-sample space;

s12: determining the output predicted value y of the sample corresponding to the current segmentation point s_predictAnd a prediction error e;

s13: the current segmentation point S is continuously updated and the above-described step S12 is repeated to determine the segmentation point S that minimizes the value of the prediction error e as the optimal segmentation point.

In said step S11, a first subsample space R₁(s)＝{x|xⁱS ≦ s }, second subsample space R₂(s)＝{x|xⁱ>s}，xⁱIs the ith characteristic variable;

in step S12, the prediction output value y of the delay prediction model_predictIs as follows;

in the formula, x_kIs a characteristic variable of the kth sample, R₁Is a first subsample space, R₂Is a second sub-sample space, c₁For the predicted output value of the time-delay prediction model in the first subsample space, c₂Predicting an output value of the time delay prediction model in a second sub-sample space; n is a radical of₁Is the number of samples of the first subsample space, N₂Is the number of samples of the second sub-sample space;

the prediction error e corresponding to the current segmentation point s is as follows:

wherein, y_kIs the calculated time delay of the kth sample.

The characteristic vector corresponding to the basic neural network layer has a plurality of characteristic variables;

the step S1 further includes:

s14: repeating the steps S11 to S13 on another characteristic variable until the optimal segmentation points corresponding to all the characteristic variables are determined;

s15: and determining the prediction output value of the time delay prediction model in each sample region according to each sample region divided by the optimal dividing point.

In step S15, the prediction output value of the delay prediction model f (x, S) is:

wherein,

wherein x is a feature vector corresponding to the neural network layer, R_mIs the m-th sample region, m is the number of the sample region, n represents the total number of the kinds of the characteristic variables, c_mAnd predicting the output value of the time delay prediction model in the mth sample region.

In the step S2, the overall time delay T (K) when the division point is K is determined, and the calculated time delay T of the first sub-neural network at the edge end is determined according to the first sub-neural network and the second sub-neural network divided when the division point is K_e(K) And the calculation time delay T of the second sub-neural network at the cloud end_c(K) And combining the data quantity D output by the K neural network layer_KAnd the network bandwidth B between the edge end and the cloud end.

In the step S2, the neural network segmentation algorithm is based on a global search of prefixes and arrays;

the step S2 includes:

s21: for each neural network layer of the neural network to be deployed, respectively determining the calculation time delay f (x (l) of the neural network layer at the edge end_p),S_edge) And the calculation time delay f (x (l)) of the neural network layer at the cloud end_p),S_cloud)；l_pIs the type of the basic neural network layer corresponding to the p-th neural network layer; p is the sequence number of the neural network layer, and p is 1 … l;

s22: for each neural network layer of a neural network to be deployed, respectively determining a prefix and an array prefix [ p ] of the neural network layer at an edge end and a prefix and an array prefix [ p ] of the neural network layer at a cloud end;

s23: for each division point K, determining the calculation time delay T of the edge end when the division point is K_e(K) Computing time delay T of cloud when division point is K_c(K) And the transmission delay when the division point is K, and summing the three to obtain the integral delay T (K) when the division point is K;

s24: traversing the division points K in the range from 1 to l-1, and taking the division point K corresponding to the minimum value in the whole time delay T (K) as the optimal division point K_best。

prefixEdge[p]＝prefixEdge[p-1]+f(x(l_p),S_edge)；

prefixCloud[p]＝prefixCloud[p-1]+f(x(l_p),S_cloud)；

T_e(K)＝prefixEdge[K]-prefixEdge[0]；

T_c(K)＝prefixEdge[l]-prefixEdge[K]；

The transmission delay at the division point K is

Wherein D is_KAnd B is the network bandwidth between the edge end and the cloud end.

The neural network edge-cloud collaborative computing segmentation deployment method provides neural network segmentation computing, the neural network is divided into two sub-neural networks which are respectively deployed on edge equipment and cloud equipment and cooperate with the computing mode of edges and a cloud end, the edge equipment only operates the sub-models after segmentation, and compared with pure edge computing, the computing sharing can be reduced, the computing efficiency is improved, and compared with cloud computing, the network transmission burden can be reduced, so that the application delay of the neural network is reduced, and the response speed is improved; meanwhile, the neural network is not compressed, so that the precision is not lost. Therefore, the time delay of the neural network application is reduced, the response speed is improved, meanwhile, the precision is not lost, and the low time delay requirement of the neural network application is realized under the scene of limited network bandwidth.

In addition, the neural network edge-cloud collaborative computing segmentation deployment method obtains the running time of the neural network by establishing a time delay prediction model, and the neural network segmentation algorithm is based on the time delay prediction model and prefix and array design, so that the running speed of the neural network edge-cloud collaborative computing segmentation deployment method is improved.

Drawings

Fig. 1 is a schematic overall flow chart of a neural network edge-cloud collaborative computing segmentation deployment method according to the present invention.

Fig. 2 is a flowchart illustrating step S3 of the neural network edge-cloud collaborative computing segmentation deployment method according to the present invention.

Fig. 3 is a schematic diagram of selecting the segmentation points of the LeNet5, AlexNet and VGG networks by the neural network edge-cloud collaborative computing segmentation deployment method under the condition that the network bandwidth is 500 KB.

Fig. 4 is a graph of performance comparison between the prefix and the global search based on the global search and the global search in step S2 of the neural network edge-cloud collaborative computing segmentation deployment method of the present invention.

Fig. 5 is a comparison graph of results of task response time in three computing modes of edge-cloud cooperative computing, cloud computing and edge computing.

Detailed Description

The present invention will be further described with reference to the following specific examples. It should be understood that the following examples are illustrative only and are not intended to limit the scope of the present invention.

The invention provides a neural network edge-cloud collaborative computing segmentation deployment method which is suitable for a neural network application scene with low time delay requirement and high response speed requirement.

As shown in fig. 1, the neural network edge-cloud collaborative computing segmentation deployment method of the present invention includes:

step S1: determining the layer type of a basic neural network layer of a neural network to be deployed, and respectively establishing a time delay prediction model f (x, S) for each computing resource and each basic neural network layer, wherein each time delay prediction model is only used for predicting the computing time delay of one basic neural network layer under one computing resource;

the adopted neural network to be deployed is a convolutional neural network, such as LeNet5, AlexNet and VGG. However, the delay prediction model does not concern specific neural networks, but only concerns which types of basic neural network layers the neural network to be predicted consists of, so that theoretically, the neural network to be deployed can be any, and any neural network is formed by overlapping and combining a plurality of basic neural network layers. According to the model established by each basic neural network layer of the neural network layers to be deployed in the step S1, the total computation time delay of the neural network to be deployed can be obtained by adding the computation time delays predicted by the time delay prediction models of all the basic neural network layers of the neural network to be deployed in the subsequent steps.

The processing task is different for any neural network, and in the embodiment, the adopted neural network to be deployed is a convolutional neural network, such as three convolutional neural networks of LeNet5, AlexNet and VGG. The input of the neural network to be deployed is data of picture type, and the output is a classification result. The basic neural network layer commonly used may be FNN, CNN or RNN, but in actual modeling, the basic neural network layer may also be ReLU or dropout, etc.

The input parameters of the time delay prediction model f (x, S) are x and S, wherein x is a feature vector corresponding to the neural network layer, and S is the type of the computing resource. In the present embodiment, the type S of the computing resource includes an edge end type S_cloudAnd cloud type S_cloud。

The purpose of step S1 is to obtain a predicted value of the computation delay y through a vector x composed of all kinds of characteristic variables corresponding to the type of the basic neural network layer, that is, to establish a delay prediction model f (x, S), so that the output of the delay prediction model f (x, S) is as close as possible to the real computation delay of the basic neural network layer.

Different types of basic neural network layers have different factors that affect the computation time, and these factors are called feature variables. Some common basic neural network layer characteristic variables are shown in table 1 below.

TABLE 1 types of basic neural network layers and their corresponding characteristic variables

As shown in table 1, the characteristic variables are determined according to the types of the corresponding basic neural network layers, and different types of neural network layers correspond to different kinds of characteristic variables.

All feature variables corresponding to the type of the basic neural network layer are expressed in the form of vectors, and are denoted as feature vectors x ═ x corresponding to the neural network layer¹,x²,..,xⁱ,..,xⁿIn which xⁱN is the total number of the categories of the characteristic variables corresponding to the types of the basic neural network layers. For example, the factors affecting the calculation delay of the active layer are only the size of data input to the layer, and the size of the input data is the characteristic variable corresponding to the active layer, and only this characteristic variable is present, so n is 1; similarly, the convolutional layer is affected by 6 factors, so that the characteristic variables corresponding to the convolutional layer are 6, and n is 6.

In order to establish the delay prediction model f (x, S), the same basic neural network layer and a sample of calculating the delay thereof need to be collected as a sample data set of the delay prediction model. Assuming that N samples (i.e., basic neural network layers with different parameters of N feature variables) are collected for the same basic neural network layer under one computing resource, a sample data set D of the corresponding delay prediction model can be obtained, where the sample data set D is:

wherein,

which represents the k-th sample of the sample,

is the ith characteristic variable corresponding to the kth sample,

subscript of (a) denotes sample ordinal number, superscript denotes ordinal number of variable; y is_kIs the calculated time delay of the kth sample, i.e. the output of the model; n represents the total number of samples in the sample data set; n represents the total number of categories of characteristic variables.

For simplicity of explanation, the kth sample is represented as (x) assuming that the feature vector of the basic neural network layer has only one feature variable_k,y_k)，x_kIs a characteristic variable of the kth sample, y_kIs the calculated time delay of the kth sample.

In step S1, a time delay prediction model is built and trained in the form of a regression tree for all kinds of feature variables corresponding to each basic neural network layer. Wherein the regression tree is a machine learning model.

Specifically, the step S1 includes: the following steps are executed for each computing resource and each basic neural network layer:

step S10: acquiring all characteristic variables of a basic neural network layer and a sample of the calculation time delay of the neural network layer under a calculation resource to form a sample data set D of a time delay prediction model;

that is, the samples collected by the edge end are used as the delay prediction model f (x, S) under the edge end_edge) The sample data set D is used for training, and samples collected through the cloud serve as a time delay prediction model f (x, S) under the cloud_cloud) For training.

As described above, the sample data set D of the delay prediction model is:

wherein,

which represents the k-th sample of the sample,

is the ith characteristic variable corresponding to the kth sample,

subscript of (a) denotes sample ordinal number, superscript denotes ordinal number of variable; y is_kIs the calculated delay of the kth sample, i.e. the output of the delay prediction model; n represents the total number of samples in the sample data set; n represents the total number of categories of characteristic variables.

Step S11: for one of the characteristic variables xⁱSetting a current segmentation point s to enable a sample data set D of the time delay prediction model to be in the characteristic variable xⁱIs divided into two sub-sample spaces, namely a first sub-sample space R₁(s) (also denoted as R)₁) And a second subsample space R₂(s) (also denoted as R)₂)；

R₁(s)＝{x|xⁱ≤s},R₂(s)＝{x|xⁱ>s}，xⁱIs the ith characteristic variable.

The value range of the category ordinal number i of the characteristic variable is 1-n, and n represents the total number of the categories of the characteristic variable; the value range of the dividing point s is

In the above-mentioned manner,

is the ith characteristic variable corresponding to the kth sample.

Therefore, the 1 st sample to the nth sample of the sample data set D of the delay prediction model are divided into two sub-sample spaces by the dividing point s, and the two sub-sample spaces are respectively divided into two sub-sample spacesIs denoted as a first subsample space R₁(s) and a second subsample space R₂(s). The two subsamples space R₁(s)，R₂The number of samples of(s) can be recorded as N₁,N₂It is easy to see, N₁+N₂＝N。

Step S12: determining the output predicted value y of the sample corresponding to the current segmentation point s_predictAnd a prediction error e;

through the training of the sample, a mapping y is obtained_predictAnd ← f (x, S). In the embodiment, the average value of the real output values y of all samples in each sub-sample space is used as the predicted output value y of the time delay prediction model_predict。

That is, the prediction output value y of the delay prediction model_predictIs as follows;

in the formula, x_kIs a characteristic variable of the kth sample, R₁Is a first subsample space, R₂Is a second sub-sample space, c₁For the predicted output value of the time-delay prediction model in the first subsample space, c₂Predicting an output value of the time delay prediction model in a second sub-sample space; n is a radical of hydrogen₁Is the number of samples of the first subsample space, N₂Is the number of samples in the second sub-sample space.

The actual output value y of the corresponding sample and the predicted output value y of the time delay prediction model are expected to be in the case that the characteristic variable is x_predictIs as small as possible, the prediction error e needs to be evaluated to obtain the smallest prediction error. Different machine learning models have different methods for estimating prediction errors, and the prediction error e in this embodiment is calculated by using an error estimation method corresponding to the regression tree model.

Therefore, the prediction error e corresponding to the current segmentation point s is:

wherein R is₁Is a first subsample space, R₂Is the second sub-sample space, x_kIs a characteristic variable of the kth sample, y_kIs the calculated time delay of the kth sample, c₁For the predicted output value of the time-delay prediction model in the first subsample space, c₂And predicting output values of the time delay prediction model in the second sub-sample space.

Step S13: the current segmentation point S is continuously updated and the above-described step S12 is repeated to determine the segmentation point S that minimizes the value of the prediction error e as the optimal segmentation point.

At this time, if the eigenvector corresponding to the basic neural network layer has only one kind of eigenvector, the above steps S11-S13 only need to be performed once to perform division point selection once for one eigenvector, and the sample space is finally divided into 2 sample regions (i.e. 2 sub-sample spaces).

It should be noted that, in this embodiment, in other embodiments, if the feature vector corresponding to the basic neural network layer has multiple feature variables, the above operations are always performed n times. That is, step S1 is also required to include the following steps:

step S14: and repeating the steps S11 to S13 for another characteristic variable until the optimal segmentation points corresponding to all the characteristic variables are determined.

Since there are n feature variables per sample, each feature variable will be divided into 2 sub-sample spaces by the dividing point, and therefore, the sample space is finally divided into 2n sample regions by the optimal dividing point, where n represents the total number of categories of feature variables.

Step S15: and determining the prediction output value of the time delay prediction model in each sample region according to each sample region divided by the optimal dividing point, thereby completing the establishment of the time delay prediction model (namely a decision tree).

Therefore, as long as a certain basic characteristic vector of the neural network layer is given, the predicted value of the calculated time delay is the predicted value of the time delay prediction model.

The prediction output value of the time delay prediction model f (x, S) is as follows:

wherein,

wherein x is a feature vector corresponding to the neural network layer, R_mM is the number of the sample region, and there are 2n sample regions in total, so that m is 1-2n, n represents the total number of the kinds of the characteristic variables, c_mAnd predicting the output value of the time delay prediction model in the mth sample region. In this embodiment, c_mThe mean value of the true output values y of all samples of the mth sample region is taken.

And completing the establishment of a binary tree type time delay prediction model (namely a decision tree). In step S1, the time delay prediction model is used to perform time delay prediction, and the time delay prediction model f (x) is a prediction value of the computation time delay of the neural network layer.

As shown in step S1, each delay prediction model is only used to predict the computation delay of a basic neural network layer under a computation resource, and therefore, for the same basic neural network layer, the delay prediction model includes the delay prediction model f (x, S) of the neural network layer under the edge terminal_edge) (corresponding to the computation delay of the neural network layer in the edge-end computation) and a delay prediction model f (x, S) of the neural network layer under the cloud_cloud) (corresponding to the computation latency of the neural network layer in cloud computing).

If the neural network to be deployed has 5 neural network layers, it is necessary to determine the types of the basic neural network layers corresponding to the 5 neural network layers, and then the above steps S10 to S14 (one corresponding to the edge end and one corresponding to the cloud end) need to be performed twice for each basic neural network layer.

Step S2: and executing a neural network segmentation algorithm to determine an optimal segmentation point of the neural network to be deployed, wherein the optimal segmentation point is a segmentation point K corresponding to the minimum value in the overall time delays T (K) when the segmentation point K is from 1 to l-1, and the segmentation point is used for segmenting the neural network to be deployed into two sub-neural networks, namely a first sub-neural network and a second sub-neural network. And l is the number of the neural network layers in the neural network to be deployed.

The segmentation point herein refers to a specific segmentation point of the neural network, and is independent of the above segmentation point and the optimal segmentation point. For example, a 100-layer neural network to be deployed, the division point is a 50 th layer, a first sub-neural network (i.e., the first 50 th layer) is deployed at the edge, a second sub-neural network (i.e., the second 50 th layer) is deployed at the cloud, the first sub-neural network before the division point is used for deployment at the edge, and the second sub-neural network after the division point is used for deployment at the cloud.

The neural network to be deployed is marked as L, and then the input information of the model segmentation algorithm is the neural network to be deployed L ═ L₁,l₂,…l_p,…,l_l}，l_pRepresenting the type of a basic neural network layer corresponding to the p-th neural network layer of the neural network to be deployed, wherein the total number of the neural network L to be deployed is L, p represents the sequence number of the neural network layer, and p is 1 to L. Each basic neural network layer of the neural network L has only one corresponding feature vector (the feature vector includes a plurality of feature variables), that is, the type L of the basic neural network layer corresponding to the pth neural network layer_pAnd a feature vector x (l)_p) The two are in one-to-one correspondence, and p represents the sequence number of the neural network layer; layer type l for each neural network layer_pCorresponding to a feature vector x (l)_p) The feature vector x (l)_p) For input to the delay prediction model.

Specific examples are as follows: suppose the 1 st nerveType l of basic neural network layer corresponding to network layer₁Is an active layer, corresponding to 1 kind of characteristic variables, then its characteristic vector x (1) ═ x¹) (ii) a Suppose the type l of the basic neural network layer corresponding to the 2 nd neural network layer₂Is a convolution layer, corresponding to 6 characteristic variables, its characteristic vector x (2) is (x)¹,x²,…x⁶)。

For a neural network model of layer l, there may be a total of l-1 segmentation points. When the division point is K, the two divided sub-neural networks are respectively edge terminal neural networks

And cloud sub-neural network

Wherein, the edge terminal neural network

And cloud sub-neural network

The method comprises the following steps:

the delay prediction model under the edge end is recorded as f (x, S)_edge) The time delay prediction model under the cloud is marked as f (x, S)_cloud) Then, the calculated delay T of the edge end when the division point is K_e(K) And cloud endCalculating the time delay T_c(K) Can be respectively recorded as:

wherein l_pIs the type of the basic neural network layer corresponding to the p-th neural network layer, x (l)_p) Type l representing the basic neural network layer corresponding to the p-th neural network layer_pAnd the corresponding characteristic vector, p is the sequence number of the basic neural network layer, and K is the segmentation point.

Simultaneously, the data volume output by the Kth neural network layer is recorded as D_KAnd the network bandwidth between the edge end and the cloud end is B, so that the integral time delay T (K) when the division point is K can be obtained. The overall delay T (K) for a division point K is:

wherein D is_KThe data volume output for the Kth neural network layer, B is the network bandwidth between the edge end and the cloud end, and T_e(K) The calculated time delay of the edge end when the division point is K, T_c(K) The calculation time delay of the cloud when the division point is K is shown.

The segmentation points K are taken to be 1 to l-1, because for a neural network model of layer l, there may be a total of l-1 segmentation points.

That is, the determination of the optimal segmentation point, i.e., the optimization problem:

argmin{(K,T(K)):K＝1,2,…,l-1}。

the optimization problem can be solved by a global search mode, that is, the minimum value of the overall time delays T (K) when the division point K takes 1 to l-1 can be determined by traversing all the overall time delays T (K) when the division point K takes 1 to l-1, and the division point K corresponding to the minimum value of the overall time delays T (K) is selected from the minimum values.

However, since the sub-neural network needs to sum repeatedly, there is an additional time overhead, and the time complexity of the search is finally o (n)²). Therefore, in order to further optimize the processing speed of the algorithm, in this embodiment, the step S22 introduces a prefix and an array for optimization based on the global search, and finally reduces the time complexity to o (n)²)。

Therefore, in the step S2, the neural network segmentation algorithm finally designed is based on the prefix and the global search of the array.

The step S2 specifically includes:

step S21: for each neural network layer of the neural network to be deployed, respectively determining the calculation time delay f (x (l) of the neural network layer at the edge end_p),S_edge) And the calculation time delay f (x (l)) of the neural network layer in the cloud_p),S_cloud)；l_pIs the type of the basic neural network layer corresponding to the p-th neural network layer; p is the sequence number of the neural network layer, and p is 1 … l;

step S22: for each neural network layer of a neural network to be deployed, respectively determining a prefix and an array prefix [ p ] of the neural network layer at an edge end and a prefix and an array prefix [ p ] of the neural network layer at a cloud end;

wherein, prefix edge [ p]＝prefixEdge[p-1]+f(x(l_p),S_edge)，

prefixCloud[p]＝prefixCloud[p-1]+f(x(l_p),S_cloud)。

prefixEdge and prefixCloud represent the prefix and array of the edge and cloud, respectively.

The prefix and the array are defined for optimizing the summation time in the algorithm, and specific examples are as follows:

assuming that 4 layers of neural networks are provided, the calculation time delays obtained through f (x) calculation are respectively as follows:

T＝[2，3，1，4]

its prefix and array are defined by the following recursion formula

prefixT[i]＝T[i]+prefixT[i-1]

According to the recurrence formula, the following can be obtained:

prefixT＝[0，2，5，6，9]

the advantage of this is that converting the sum into the difference is an optimization means and has no special meaning.

Step S23: for each division point K, determining the calculation time delay T of the edge end when the division point is K_e(K) Computing time delay T of cloud when division point is K_c(K) And the transmission delay when the division point is K, and summing the three to obtain the integral delay T (K) when the division point is K;

wherein, the calculation time delay T of the edge end when the division point is K_e(K) (i.e. T)_edge) Comprises the following steps:

T_e(K)＝prefixEdge[K]-prefixEdge[0]；

computing time delay T of cloud when division point is K_c(K) (i.e. T)_cloud) Comprises the following steps:

T_c(K)＝prefixEdge[l]-prefixEdge[K]；

propagation delay with a division point of K (i.e. T)_comm) Is composed of

Step S24: traversing the division points K in the range from 1 to l-1, and taking the division point K corresponding to the minimum value in the whole time delay T (K) as the optimal division point K_best。

That is, the input parameters of the neural network segmentation algorithm include:

{l_p|p＝1…l}：l_pis the type of the basic neural network layer corresponding to the p-th neural network layer;

{D_K|K＝1…N}：D_Kis the data volume output by the Kth neural network layer;

f (x, S): a delay prediction model;

b: network bandwidth between the edge and the cloud; and

T_best: the current optimal overall delay.

The output parameters of the neural network segmentation algorithm comprise the optimalDivision point K_best。

Part of the code is as follows:

1.forpin range 1…l:

2.prefixEdge[p]＝prefixEdge[p-1]+f(x(l_p),S_edge)

3.prefixCloud[p]＝prefixCloud[p-1]+f(x(l_p),S_cloud)

4.for K in range 1…l-1:

5.T_edge＝prefixEdge[K]-prefixEdge[0]

6.T_cloud＝prefixEdge[l]-prefixEdge[K]

7.

8.if T_edge+T_colud+T_comm<T_best:

9.T_best＝T_edge+T_colud+T_comm

10.K_best＝K

11.return K_best

step S3: the trained neural network to be deployed is divided into a first sub-neural network and a second sub-neural network by using the optimal dividing point, the first sub-neural network is deployed on the equipment at the edge end, and the second sub-neural network is deployed on the cloud server for performing collaborative computing.

The sub-neural network before the division point is a first sub-neural network and is used for being deployed at the edge end; the sub-neural network behind the division point is a second sub-neural network and is used for being deployed at the cloud.

Referring to fig. 2, for the overall collaborative flow of step S3, for the segmented and deployed neural network model, a user initiates a request to the edge-side device to transmit request data, where the request data is first subjected to the operation of a first sub-neural network on the edge-side device, and then the intermediate computation result is sent to the cloud server to perform the operation of a second sub-neural network, so as to sequentially implement edge-cloud collaborative computation.

In practice, the edge device may be a base station, a router, an edge gateway, or other edge device near the user. When the terminal equipment of the user sends a request, the request data firstly passes through the equipment of the edge end, the equipment of the edge end carries out primary processing on the request data through the sub-network, and the primary processing result is transmitted to the cloud end for subsequent operation.

In summary, the edge cloud collaborative computing process can be summarized as follows:

1. acquiring the hierarchical information of the model for the neural network to be deployed; and establishing a time delay prediction model according to the level information of the model. If the predictive model has been previously established, step 2 is performed directly.

2. And obtaining an optimal segmentation point through a neural network segmentation algorithm according to the time delay prediction model, wherein the optimal segmentation point is used for segmenting the neural network into a first sub-neural network and a second sub-neural network at the segmentation point.

3. And deploying the first sub-neural network to the edge equipment requested by the user service, deploying the second sub-neural network to the cloud server, and finally performing cooperative computing.

Results of the experiment

We used LeNet5, AlexNet and VGG networks to validate the neural network edge-cloud collaborative computing partition deployment approach that implements the present invention.

Firstly, aiming at the layer type of each neural network layer of the neural network adopted by the user, the characteristic variables are selected according to the following table and used for establishing a time delay prediction model, and the characteristic variables of the layer type can be selected according to the following table:

and then, determining the neural network segmentation point by using a time delay prediction model. The determination of the segmentation points depends on the network bandwidth of the edge end and the cloud end, and when the network bandwidth is poor, the segmentation points tend to be segmented in the layer behind the neural network, so that most of the neural network layer is placed at the edge end for operation as far as possible, and the network overhead is reduced. When the network bandwidth is good, the partitioning point will tend to partition at the previous layer to make the best possible use of the computational resources. Fig. 3 shows a schematic diagram of selecting the segmentation points of the LeNet5, AlexNet and VGG networks by the neural network edge-cloud collaborative computing segmentation deployment method of the present invention under the condition that the network bandwidth is 500KB, and it can be seen that the segmentation points are located on the neural network layer with small output data amount and are as far forward as possible, which is in line with expectations.

Fig. 4 is a comparison graph of performance improvement based on global search and prefix and global search in step S2 of the neural network edge-cloud collaborative computing segmentation deployment method of the present invention, where the left sub-graph is theoretically a change in algorithm complexity with expansion of problem scale, and the right sub-graph is actually a time consumed by the algorithm with increase in the number of processed neural network layers.

It can be seen that the prefix and array optimized global search algorithm can bring better processing efficiency, and the actually brought promotion is consistent with the theoretically brought promotion.

After the segmentation point is determined, the neural network before the segmentation point is finally deployed on the edge device, and the neural network after the segmentation point is deployed on the cloud. In order to visually observe the improvement of edge-cloud cooperative computing in comparison with cloud computing and edge computing along with the change of network bandwidth, the range of the bandwidth between the edge and the cloud is set to be 0.1-10MB, the task response time in three computing modes is counted, and the result can be seen in fig. 5.

It can be seen that when the network bandwidth is poor, cloud computing is limited by communication delay, which causes too slow response speed, and thus is far inferior to edge computing and edge cloud cooperative computing, whereas edge computing causes large computation delay when the computational neural network is complex because of limited computing power of edge devices, which is particularly obvious in the VGG network. Summarizing the results in fig. 5, it can be seen that the edge-cloud collaborative computation is a computing mode that can take advantages of both edge computing and cloud computing, and can bring faster response speed to the neural network application under the condition of limited network bandwidth.

The neural network edge-cloud collaborative computing segmentation deployment method is used as a compromise scheme, and can give consideration to both computing delay and communication delay. In the computing aspect, part of the neural network layer can be unloaded to the cloud server for computing, so that the computing load of the edge end is reduced, and the computing speed is increased. In the communication layer, the data quantity needing to be transmitted in the network is reduced by selecting a proper neural network partition point, so that the burden of the network is relieved. By the mode, the neural network can be flexibly segmented according to the network condition and the computing power of the edge device, the neural network is divided into two parts and is respectively deployed at the edge end and the cloud end for computing, compared with the traditional edge computing and cloud computing, the computing mode can bring better response speed, and the proposed segmentation deployment algorithm is more effective.

The above embodiments are merely preferred embodiments of the present invention, which are not intended to limit the scope of the present invention, and various changes may be made in the above embodiments of the present invention. All simple and equivalent changes and modifications made according to the claims and the content of the specification of the present application fall within the scope of the claims of the present patent application. The invention has not been described in detail in order to avoid obscuring the invention.

Claims

1. A neural network edge-cloud collaborative computing segmentation deployment method is characterized by comprising the following steps:

step S2: executing a neural network segmentation algorithm to determine an optimal segmentation point of the neural network to be deployed, wherein the optimal segmentation point is a segmentation point K corresponding to the minimum value in all overall time delays T (K) when the segmentation point K is from 1 to l-1, the segmentation point is used for segmenting the neural network to be deployed into a first sub-neural network and a second sub-neural network, and l is the number of layers of a neural network layer in the neural network to be deployed;

2. The neural network edge-cloud collaborative computing segmentation deployment method of claim 1, wherein input parameters of the delay prediction model f (x, S) are x and S, x is a feature vector corresponding to a neural network layer, the feature vector has one or more feature variables, S is a type of computing resource, and the type S of computing resource includes an edge end type S_cloudAnd cloud type S_cloud。

3. The neural network edge-cloud collaborative computing segmentation deployment method according to claim 2, wherein in the step S1, a time delay prediction model is established and trained in a regression tree form for all kinds of feature variables corresponding to each basic neural network layer.

4. The neural network edge-cloud collaborative computing partition deployment method according to claim 3, wherein the step S1 includes: the following steps are executed for each computing resource and each basic neural network layer:

step S11: for one of the characteristic variables xⁱSetting a current segmentation point s to enable a sample data set D of the time delay prediction model to be in the characteristic variable xⁱIs divided into two sub-sample spaces, namely a first sub-sample space and a second sub-sample space;

step S12: determining a current segmentationOutput predicted value y of sample corresponding to point s_predictAnd a prediction error e;

5. The neural network edge-cloud collaborative computing partition deployment method according to claim 4, wherein in the step S11, the first subsample space R₁(s)＝{x|xⁱS ≦ s }, second subsample space R₂(s)＝{x|xⁱ＞s}，xⁱIs the ith characteristic variable;

in the step S12, the prediction output value y of the time delay prediction model_predictIs as follows;

wherein, y_kIs the calculated time delay of the kth sample.

6. The neural network edge-cloud collaborative computing segmentation deployment method according to claim 4, wherein the feature vector corresponding to the basic neural network layer has a plurality of feature variables;

the step S1 further includes:

step S14: repeating the steps S11 to S13 on another characteristic variable until the optimal segmentation points corresponding to all the characteristic variables are determined;

step S15: and determining the prediction output value of the time delay prediction model in each sample region according to each sample region divided by the optimal dividing point.

7. The neural network edge-cloud collaborative computing partition deployment method according to claim 4, wherein in the step S15, the prediction output value of the time delay prediction model f (x, S) is:

wherein,

8. The neural network edge-cloud collaborative computing partition deployment method according to claim 1, wherein in step S2, the overall time delay T (K) when the partition point is K is determined based on the first sub-neural network and the second sub-neural network partitioned when the partition point is K, and the computing time delay T of the first sub-neural network at the edge end is determined_e(K) And the calculation time delay T of the second sub-neural network at the cloud_c(K) And combining the data volume output by the Kth neural network layerD_KAnd the network bandwidth B between the edge end and the cloud end.

9. The neural network edge-cloud collaborative computing partition deployment method according to claim 1, wherein in the step S2, the neural network partition algorithm is based on a global search of prefixes and arrays;

the step S2 includes:

step S21: for each neural network layer of the neural network to be deployed, respectively determining the calculation time delay f (x (l) of the neural network layer at the edge end_p)，S_edge) And the calculation time delay f (x (l)) of the neural network layer at the cloud end_p)，S_cloud)；l_pIs the type of the basic neural network layer corresponding to the p-th neural network layer; p is the sequence number of the neural network layer, and p is 1.. l;

10. The neural network edge-cloud collaborative computing segmentation deployment method of claim 9, wherein prefixEdge [ p ]]＝prefixEdge[p-1]+f(x(l_p)，S_edge)；

prefixCloud[p]＝prefixCloud[p-1]+f(x(l_p)，S_cloud)；

T_e(K)＝prefixEdge[K]-prefixEdge[0]；

T_c(K)＝prefixEdge[l]-prefixEdge[K]；

The transmission delay at the division point K is