CN114528987A - Neural network edge-cloud collaborative computing segmentation deployment method - Google Patents

Neural network edge-cloud collaborative computing segmentation deployment method Download PDF

Info

Publication number
CN114528987A
CN114528987A CN202210137345.6A CN202210137345A CN114528987A CN 114528987 A CN114528987 A CN 114528987A CN 202210137345 A CN202210137345 A CN 202210137345A CN 114528987 A CN114528987 A CN 114528987A
Authority
CN
China
Prior art keywords
neural network
cloud
time delay
edge
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210137345.6A
Other languages
Chinese (zh)
Inventor
周明拓
任天锋
郁春波
贺文
李剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Microsystem and Information Technology of CAS
Original Assignee
Shanghai Institute of Microsystem and Information Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Microsystem and Information Technology of CAS filed Critical Shanghai Institute of Microsystem and Information Technology of CAS
Priority to CN202210137345.6A priority Critical patent/CN114528987A/en
Publication of CN114528987A publication Critical patent/CN114528987A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a neural network edge-cloud collaborative computing segmentation deployment method, which comprises the following steps: establishing time delay prediction models, wherein each time delay prediction model is used for predicting the calculation time delay of a basic neural network layer under a calculation resource; determining an optimal segmentation point of a neural network to be deployed, namely a segmentation point corresponding to the minimum overall time delay; and dividing the trained neural network to be deployed into a first sub neural network and a second sub neural network by using the optimal dividing point, and respectively deploying the first sub neural network and the second sub neural network on the equipment at the edge end and the cloud server to perform collaborative computing. According to the method, the neural network is divided into two sub-neural networks which are respectively deployed on the edge equipment and the cloud equipment, so that the calculation sharing can be reduced compared with pure edge calculation, the calculation efficiency is improved, and the network transmission burden can be reduced compared with cloud calculation, so that the application delay of the neural network is reduced, and the response speed is improved; at the same time, the model is not compressed and therefore accuracy is not lost.

Description

Neural network edge-cloud collaborative computing segmentation deployment method
Technical Field
The invention belongs to the field of artificial intelligence and edge computing, and particularly relates to a neural network edge-cloud collaborative computing segmentation deployment method which can be used for neural network application deployment under the requirements of high load and low time delay.
Background
With the rise of the field of artificial intelligence and the popularization of neural network application, more and more devices can support the neural network application. Various types of applications, such as image classification, voice recognition, and natural language processing, are increasingly popular among various types of terminal devices of users. In the face of requests of a large number of users and increasingly heavy network transmission load, how to enhance the response speed of neural network application and improve the request processing capability is a technical difficulty to be solved currently.
The neural network technology is an effective method for realizing artificial intelligence application. Current artificial intelligence applications often build models using a series of neural network layers, and thus such models are also referred to as deep neural networks, each layer of which is composed of neurons capable of producing nonlinear outputs from input-output data of the neurons. Neurons in the input layer receive data and propagate it to the intermediate layer (also referred to as the hidden layer). The neurons of the middle layer then generate a weighted sum of the input data and output the weighted sum using a particular activation function, and then propagate the output to the output layer. The final result is displayed on the output layer. The deep neural network has more complex and abstract layers than a general machine learning model, and can learn high-level characteristics, so that high-precision task reasoning is realized. The deep neural network has three common structures, namely a full connection layer (FNN), a convolutional neural network layer (CNN) and a recurrent neural network layer (RNN).
Edge computing is an emerging area of research that is dedicated to pushing cloud services from the core of the network to network edge devices closer to the requesting party, such as network base stations, routers, etc. The core of the method is to provide computing service for a requesting party and simultaneously to be close to a requesting terminal as much as possible. Edge computing can bring lower transmission delay and bandwidth consumption compared with cloud computing, but at the same time, the computing resources of edge computing are limited. In order to better utilize computing resources in the network, the response speed of the neural network is improved. Researchers have begun introducing edge calculations in neural network computing processes.
In the existing neural network application scenario, the neural network application generally has a large computational power requirement, and a neural network model is often deployed on a cloud server. And the user sends the request data to the cloud terminal for inference at the edge end through the network, and then returns the inference result to the user. This approach is very dependent on the requestor and cloud network conditions. If the data volume of the requester is too large or the network bandwidth is limited, the response speed of the request is affected by the communication delay, and finally the transmission delay of the data is greatly increased, and the response time of the request is too long, so that the response speed of the neural network application is reduced. Therefore, in the case of limited network bandwidth, the delay overhead of network transmission occupies a higher proportion than the delay generated by cloud computing.
In order to reduce the overhead of network transmission delay, research attempts are currently made to directly deploy a neural network on an edge terminal, that is, a network base station, an edge router, and other devices. However, these edge devices have limited computing power, and if the computing resource requirement of the neural network model is too large or complex, the edge devices cause higher computing delay. In order to solve the problem, the prior art is dedicated to compressing the model, and the computational resource requirement of the model is reduced by means of early termination, pruning, calculation precision reduction and the like, but the technical means often causes the reduction of the prediction precision of the model.
Disclosure of Invention
The invention aims to provide a neural network edge-cloud collaborative computing segmentation deployment method, which is used for reducing the application delay of a neural network, improving the response speed and simultaneously ensuring no loss of precision.
In order to achieve the above object, the present invention provides a neural network edge-cloud collaborative computing segmentation deployment method, including:
s1: determining the layer type of a basic neural network layer of a neural network to be deployed, and respectively establishing a time delay prediction model f (x, S) for each computing resource and each basic neural network layer, wherein each time delay prediction model is only used for predicting the computing time delay of one basic neural network layer under one computing resource;
s2: executing a neural network segmentation algorithm to determine an optimal segmentation point of the neural network to be deployed, wherein the optimal segmentation point is a segmentation point K corresponding to the minimum value in all overall time delays T (K) when the segmentation point K is from 1 to l-1, and the segmentation point is used for segmenting the neural network to be deployed into a first sub-neural network and a second sub-neural network; l is the number of neural network layers in the neural network to be deployed;
s3: the trained neural network to be deployed is divided into a first sub-neural network and a second sub-neural network by using the optimal dividing point, the first sub-neural network is deployed on the equipment at the edge end, and the second sub-neural network is deployed on the cloud server for performing collaborative computing.
The input parameters of the time delay prediction model f (x, S) are x and S, x is a feature vector corresponding to a neural network layer, the feature vector has one or more feature variables, S is the type of computing resources, and the type S of the computing resources comprises an edge terminal type ScloudAnd cloud type Scloud
In step S1, a time delay prediction model is built and trained in the form of a regression tree for all kinds of feature variables corresponding to each basic neural network layer.
The step S1 includes: the following steps are executed for each computing resource and each basic neural network layer:
s10: acquiring all characteristic variables of a basic neural network layer and a sample of the calculation time delay of the neural network layer under a calculation resource to form a sample data set D of a time delay prediction model;
s11: for one of the characteristic variables xiSetting a current segmentation point s to enable a sample data set D of the time delay prediction model to be in the characteristic variable xiIs divided into two sub-sample spaces in dimensionInter, i.e., the first sub-sample space and the second sub-sample space;
s12: determining the output predicted value y of the sample corresponding to the current segmentation point spredictAnd a prediction error e;
s13: the current segmentation point S is continuously updated and the above-described step S12 is repeated to determine the segmentation point S that minimizes the value of the prediction error e as the optimal segmentation point.
In said step S11, a first subsample space R1(s)={x|xiS ≦ s }, second subsample space R2(s)={x|xi>s},xiIs the ith characteristic variable;
in step S12, the prediction output value y of the delay prediction modelpredictIs as follows;
Figure BDA0003505452750000031
in the formula, xkIs a characteristic variable of the kth sample, R1Is a first subsample space, R2Is a second sub-sample space, c1For the predicted output value of the time-delay prediction model in the first subsample space, c2Predicting an output value of the time delay prediction model in a second sub-sample space; n is a radical of1Is the number of samples of the first subsample space, N2Is the number of samples of the second sub-sample space;
the prediction error e corresponding to the current segmentation point s is as follows:
Figure BDA0003505452750000041
wherein, ykIs the calculated time delay of the kth sample.
The characteristic vector corresponding to the basic neural network layer has a plurality of characteristic variables;
the step S1 further includes:
s14: repeating the steps S11 to S13 on another characteristic variable until the optimal segmentation points corresponding to all the characteristic variables are determined;
s15: and determining the prediction output value of the time delay prediction model in each sample region according to each sample region divided by the optimal dividing point.
In step S15, the prediction output value of the delay prediction model f (x, S) is:
Figure BDA0003505452750000042
wherein,
Figure BDA0003505452750000043
wherein x is a feature vector corresponding to the neural network layer, RmIs the m-th sample region, m is the number of the sample region, n represents the total number of the kinds of the characteristic variables, cmAnd predicting the output value of the time delay prediction model in the mth sample region.
In the step S2, the overall time delay T (K) when the division point is K is determined, and the calculated time delay T of the first sub-neural network at the edge end is determined according to the first sub-neural network and the second sub-neural network divided when the division point is Ke(K) And the calculation time delay T of the second sub-neural network at the cloud endc(K) And combining the data quantity D output by the K neural network layerKAnd the network bandwidth B between the edge end and the cloud end.
In the step S2, the neural network segmentation algorithm is based on a global search of prefixes and arrays;
the step S2 includes:
s21: for each neural network layer of the neural network to be deployed, respectively determining the calculation time delay f (x (l) of the neural network layer at the edge endp),Sedge) And the calculation time delay f (x (l)) of the neural network layer at the cloud endp),Scloud);lpIs the type of the basic neural network layer corresponding to the p-th neural network layer; p is the sequence number of the neural network layer, and p is 1 … l;
s22: for each neural network layer of a neural network to be deployed, respectively determining a prefix and an array prefix [ p ] of the neural network layer at an edge end and a prefix and an array prefix [ p ] of the neural network layer at a cloud end;
s23: for each division point K, determining the calculation time delay T of the edge end when the division point is Ke(K) Computing time delay T of cloud when division point is Kc(K) And the transmission delay when the division point is K, and summing the three to obtain the integral delay T (K) when the division point is K;
s24: traversing the division points K in the range from 1 to l-1, and taking the division point K corresponding to the minimum value in the whole time delay T (K) as the optimal division point Kbest
prefixEdge[p]=prefixEdge[p-1]+f(x(lp),Sedge);
prefixCloud[p]=prefixCloud[p-1]+f(x(lp),Scloud);
Te(K)=prefixEdge[K]-prefixEdge[0];
Tc(K)=prefixEdge[l]-prefixEdge[K];
The transmission delay at the division point K is
Figure BDA0003505452750000051
Wherein D isKAnd B is the network bandwidth between the edge end and the cloud end.
The neural network edge-cloud collaborative computing segmentation deployment method provides neural network segmentation computing, the neural network is divided into two sub-neural networks which are respectively deployed on edge equipment and cloud equipment and cooperate with the computing mode of edges and a cloud end, the edge equipment only operates the sub-models after segmentation, and compared with pure edge computing, the computing sharing can be reduced, the computing efficiency is improved, and compared with cloud computing, the network transmission burden can be reduced, so that the application delay of the neural network is reduced, and the response speed is improved; meanwhile, the neural network is not compressed, so that the precision is not lost. Therefore, the time delay of the neural network application is reduced, the response speed is improved, meanwhile, the precision is not lost, and the low time delay requirement of the neural network application is realized under the scene of limited network bandwidth.
In addition, the neural network edge-cloud collaborative computing segmentation deployment method obtains the running time of the neural network by establishing a time delay prediction model, and the neural network segmentation algorithm is based on the time delay prediction model and prefix and array design, so that the running speed of the neural network edge-cloud collaborative computing segmentation deployment method is improved.
Drawings
Fig. 1 is a schematic overall flow chart of a neural network edge-cloud collaborative computing segmentation deployment method according to the present invention.
Fig. 2 is a flowchart illustrating step S3 of the neural network edge-cloud collaborative computing segmentation deployment method according to the present invention.
Fig. 3 is a schematic diagram of selecting the segmentation points of the LeNet5, AlexNet and VGG networks by the neural network edge-cloud collaborative computing segmentation deployment method under the condition that the network bandwidth is 500 KB.
Fig. 4 is a graph of performance comparison between the prefix and the global search based on the global search and the global search in step S2 of the neural network edge-cloud collaborative computing segmentation deployment method of the present invention.
Fig. 5 is a comparison graph of results of task response time in three computing modes of edge-cloud cooperative computing, cloud computing and edge computing.
Detailed Description
The present invention will be further described with reference to the following specific examples. It should be understood that the following examples are illustrative only and are not intended to limit the scope of the present invention.
The invention provides a neural network edge-cloud collaborative computing segmentation deployment method which is suitable for a neural network application scene with low time delay requirement and high response speed requirement.
As shown in fig. 1, the neural network edge-cloud collaborative computing segmentation deployment method of the present invention includes:
step S1: determining the layer type of a basic neural network layer of a neural network to be deployed, and respectively establishing a time delay prediction model f (x, S) for each computing resource and each basic neural network layer, wherein each time delay prediction model is only used for predicting the computing time delay of one basic neural network layer under one computing resource;
the adopted neural network to be deployed is a convolutional neural network, such as LeNet5, AlexNet and VGG. However, the delay prediction model does not concern specific neural networks, but only concerns which types of basic neural network layers the neural network to be predicted consists of, so that theoretically, the neural network to be deployed can be any, and any neural network is formed by overlapping and combining a plurality of basic neural network layers. According to the model established by each basic neural network layer of the neural network layers to be deployed in the step S1, the total computation time delay of the neural network to be deployed can be obtained by adding the computation time delays predicted by the time delay prediction models of all the basic neural network layers of the neural network to be deployed in the subsequent steps.
The processing task is different for any neural network, and in the embodiment, the adopted neural network to be deployed is a convolutional neural network, such as three convolutional neural networks of LeNet5, AlexNet and VGG. The input of the neural network to be deployed is data of picture type, and the output is a classification result. The basic neural network layer commonly used may be FNN, CNN or RNN, but in actual modeling, the basic neural network layer may also be ReLU or dropout, etc.
The input parameters of the time delay prediction model f (x, S) are x and S, wherein x is a feature vector corresponding to the neural network layer, and S is the type of the computing resource. In the present embodiment, the type S of the computing resource includes an edge end type ScloudAnd cloud type Scloud
The purpose of step S1 is to obtain a predicted value of the computation delay y through a vector x composed of all kinds of characteristic variables corresponding to the type of the basic neural network layer, that is, to establish a delay prediction model f (x, S), so that the output of the delay prediction model f (x, S) is as close as possible to the real computation delay of the basic neural network layer.
Different types of basic neural network layers have different factors that affect the computation time, and these factors are called feature variables. Some common basic neural network layer characteristic variables are shown in table 1 below.
TABLE 1 types of basic neural network layers and their corresponding characteristic variables
Figure BDA0003505452750000071
As shown in table 1, the characteristic variables are determined according to the types of the corresponding basic neural network layers, and different types of neural network layers correspond to different kinds of characteristic variables.
All feature variables corresponding to the type of the basic neural network layer are expressed in the form of vectors, and are denoted as feature vectors x ═ x corresponding to the neural network layer1,x2,..,xi,..,xnIn which xiN is the total number of the categories of the characteristic variables corresponding to the types of the basic neural network layers. For example, the factors affecting the calculation delay of the active layer are only the size of data input to the layer, and the size of the input data is the characteristic variable corresponding to the active layer, and only this characteristic variable is present, so n is 1; similarly, the convolutional layer is affected by 6 factors, so that the characteristic variables corresponding to the convolutional layer are 6, and n is 6.
In order to establish the delay prediction model f (x, S), the same basic neural network layer and a sample of calculating the delay thereof need to be collected as a sample data set of the delay prediction model. Assuming that N samples (i.e., basic neural network layers with different parameters of N feature variables) are collected for the same basic neural network layer under one computing resource, a sample data set D of the corresponding delay prediction model can be obtained, where the sample data set D is:
Figure BDA0003505452750000081
wherein,
Figure BDA0003505452750000082
which represents the k-th sample of the sample,
Figure BDA0003505452750000083
is the ith characteristic variable corresponding to the kth sample,
Figure BDA0003505452750000084
subscript of (a) denotes sample ordinal number, superscript denotes ordinal number of variable; y iskIs the calculated time delay of the kth sample, i.e. the output of the model; n represents the total number of samples in the sample data set; n represents the total number of categories of characteristic variables.
For simplicity of explanation, the kth sample is represented as (x) assuming that the feature vector of the basic neural network layer has only one feature variablek,yk),xkIs a characteristic variable of the kth sample, ykIs the calculated time delay of the kth sample.
In step S1, a time delay prediction model is built and trained in the form of a regression tree for all kinds of feature variables corresponding to each basic neural network layer. Wherein the regression tree is a machine learning model.
Specifically, the step S1 includes: the following steps are executed for each computing resource and each basic neural network layer:
step S10: acquiring all characteristic variables of a basic neural network layer and a sample of the calculation time delay of the neural network layer under a calculation resource to form a sample data set D of a time delay prediction model;
that is, the samples collected by the edge end are used as the delay prediction model f (x, S) under the edge endedge) The sample data set D is used for training, and samples collected through the cloud serve as a time delay prediction model f (x, S) under the cloudcloud) For training.
As described above, the sample data set D of the delay prediction model is:
Figure BDA0003505452750000085
wherein,
Figure BDA0003505452750000086
which represents the k-th sample of the sample,
Figure BDA0003505452750000087
is the ith characteristic variable corresponding to the kth sample,
Figure BDA0003505452750000088
subscript of (a) denotes sample ordinal number, superscript denotes ordinal number of variable; y iskIs the calculated delay of the kth sample, i.e. the output of the delay prediction model; n represents the total number of samples in the sample data set; n represents the total number of categories of characteristic variables.
Step S11: for one of the characteristic variables xiSetting a current segmentation point s to enable a sample data set D of the time delay prediction model to be in the characteristic variable xiIs divided into two sub-sample spaces, namely a first sub-sample space R1(s) (also denoted as R)1) And a second subsample space R2(s) (also denoted as R)2);
R1(s)={x|xi≤s},R2(s)={x|xi>s},xiIs the ith characteristic variable.
The value range of the category ordinal number i of the characteristic variable is 1-n, and n represents the total number of the categories of the characteristic variable; the value range of the dividing point s is
Figure BDA0003505452750000091
In the above-mentioned manner,
Figure BDA0003505452750000092
is the ith characteristic variable corresponding to the kth sample.
Therefore, the 1 st sample to the nth sample of the sample data set D of the delay prediction model are divided into two sub-sample spaces by the dividing point s, and the two sub-sample spaces are respectively divided into two sub-sample spacesIs denoted as a first subsample space R1(s) and a second subsample space R2(s). The two subsamples space R1(s),R2The number of samples of(s) can be recorded as N1,N2It is easy to see, N1+N2=N。
Step S12: determining the output predicted value y of the sample corresponding to the current segmentation point spredictAnd a prediction error e;
through the training of the sample, a mapping y is obtainedpredictAnd ← f (x, S). In the embodiment, the average value of the real output values y of all samples in each sub-sample space is used as the predicted output value y of the time delay prediction modelpredict
That is, the prediction output value y of the delay prediction modelpredictIs as follows;
Figure BDA0003505452750000093
Figure BDA0003505452750000094
in the formula, xkIs a characteristic variable of the kth sample, R1Is a first subsample space, R2Is a second sub-sample space, c1For the predicted output value of the time-delay prediction model in the first subsample space, c2Predicting an output value of the time delay prediction model in a second sub-sample space; n is a radical of hydrogen1Is the number of samples of the first subsample space, N2Is the number of samples in the second sub-sample space.
The actual output value y of the corresponding sample and the predicted output value y of the time delay prediction model are expected to be in the case that the characteristic variable is xpredictIs as small as possible, the prediction error e needs to be evaluated to obtain the smallest prediction error. Different machine learning models have different methods for estimating prediction errors, and the prediction error e in this embodiment is calculated by using an error estimation method corresponding to the regression tree model.
Therefore, the prediction error e corresponding to the current segmentation point s is:
Figure BDA0003505452750000101
wherein R is1Is a first subsample space, R2Is the second sub-sample space, xkIs a characteristic variable of the kth sample, ykIs the calculated time delay of the kth sample, c1For the predicted output value of the time-delay prediction model in the first subsample space, c2And predicting output values of the time delay prediction model in the second sub-sample space.
Step S13: the current segmentation point S is continuously updated and the above-described step S12 is repeated to determine the segmentation point S that minimizes the value of the prediction error e as the optimal segmentation point.
At this time, if the eigenvector corresponding to the basic neural network layer has only one kind of eigenvector, the above steps S11-S13 only need to be performed once to perform division point selection once for one eigenvector, and the sample space is finally divided into 2 sample regions (i.e. 2 sub-sample spaces).
It should be noted that, in this embodiment, in other embodiments, if the feature vector corresponding to the basic neural network layer has multiple feature variables, the above operations are always performed n times. That is, step S1 is also required to include the following steps:
step S14: and repeating the steps S11 to S13 for another characteristic variable until the optimal segmentation points corresponding to all the characteristic variables are determined.
Since there are n feature variables per sample, each feature variable will be divided into 2 sub-sample spaces by the dividing point, and therefore, the sample space is finally divided into 2n sample regions by the optimal dividing point, where n represents the total number of categories of feature variables.
Step S15: and determining the prediction output value of the time delay prediction model in each sample region according to each sample region divided by the optimal dividing point, thereby completing the establishment of the time delay prediction model (namely a decision tree).
Therefore, as long as a certain basic characteristic vector of the neural network layer is given, the predicted value of the calculated time delay is the predicted value of the time delay prediction model.
The prediction output value of the time delay prediction model f (x, S) is as follows:
Figure BDA0003505452750000102
wherein,
Figure BDA0003505452750000103
wherein x is a feature vector corresponding to the neural network layer, RmM is the number of the sample region, and there are 2n sample regions in total, so that m is 1-2n, n represents the total number of the kinds of the characteristic variables, cmAnd predicting the output value of the time delay prediction model in the mth sample region. In this embodiment, cmThe mean value of the true output values y of all samples of the mth sample region is taken.
And completing the establishment of a binary tree type time delay prediction model (namely a decision tree). In step S1, the time delay prediction model is used to perform time delay prediction, and the time delay prediction model f (x) is a prediction value of the computation time delay of the neural network layer.
As shown in step S1, each delay prediction model is only used to predict the computation delay of a basic neural network layer under a computation resource, and therefore, for the same basic neural network layer, the delay prediction model includes the delay prediction model f (x, S) of the neural network layer under the edge terminaledge) (corresponding to the computation delay of the neural network layer in the edge-end computation) and a delay prediction model f (x, S) of the neural network layer under the cloudcloud) (corresponding to the computation latency of the neural network layer in cloud computing).
If the neural network to be deployed has 5 neural network layers, it is necessary to determine the types of the basic neural network layers corresponding to the 5 neural network layers, and then the above steps S10 to S14 (one corresponding to the edge end and one corresponding to the cloud end) need to be performed twice for each basic neural network layer.
Step S2: and executing a neural network segmentation algorithm to determine an optimal segmentation point of the neural network to be deployed, wherein the optimal segmentation point is a segmentation point K corresponding to the minimum value in the overall time delays T (K) when the segmentation point K is from 1 to l-1, and the segmentation point is used for segmenting the neural network to be deployed into two sub-neural networks, namely a first sub-neural network and a second sub-neural network. And l is the number of the neural network layers in the neural network to be deployed.
The segmentation point herein refers to a specific segmentation point of the neural network, and is independent of the above segmentation point and the optimal segmentation point. For example, a 100-layer neural network to be deployed, the division point is a 50 th layer, a first sub-neural network (i.e., the first 50 th layer) is deployed at the edge, a second sub-neural network (i.e., the second 50 th layer) is deployed at the cloud, the first sub-neural network before the division point is used for deployment at the edge, and the second sub-neural network after the division point is used for deployment at the cloud.
The neural network to be deployed is marked as L, and then the input information of the model segmentation algorithm is the neural network to be deployed L ═ L1,l2,…lp,…,ll},lpRepresenting the type of a basic neural network layer corresponding to the p-th neural network layer of the neural network to be deployed, wherein the total number of the neural network L to be deployed is L, p represents the sequence number of the neural network layer, and p is 1 to L. Each basic neural network layer of the neural network L has only one corresponding feature vector (the feature vector includes a plurality of feature variables), that is, the type L of the basic neural network layer corresponding to the pth neural network layerpAnd a feature vector x (l)p) The two are in one-to-one correspondence, and p represents the sequence number of the neural network layer; layer type l for each neural network layerpCorresponding to a feature vector x (l)p) The feature vector x (l)p) For input to the delay prediction model.
Specific examples are as follows: suppose the 1 st nerveType l of basic neural network layer corresponding to network layer1Is an active layer, corresponding to 1 kind of characteristic variables, then its characteristic vector x (1) ═ x1) (ii) a Suppose the type l of the basic neural network layer corresponding to the 2 nd neural network layer2Is a convolution layer, corresponding to 6 characteristic variables, its characteristic vector x (2) is (x)1,x2,…x6)。
In the step S2, the overall time delay T (K) when the division point is K is determined, and the calculated time delay T of the first sub-neural network at the edge end is determined according to the first sub-neural network and the second sub-neural network divided when the division point is Ke(K) And the calculation time delay T of the second sub-neural network at the cloud endc(K) And combining the data quantity D output by the K neural network layerKAnd the network bandwidth B between the edge end and the cloud end.
For a neural network model of layer l, there may be a total of l-1 segmentation points. When the division point is K, the two divided sub-neural networks are respectively edge terminal neural networks
Figure BDA0003505452750000121
And cloud sub-neural network
Figure BDA0003505452750000122
Wherein, the edge terminal neural network
Figure BDA0003505452750000123
And cloud sub-neural network
Figure BDA0003505452750000124
The method comprises the following steps:
Figure BDA0003505452750000125
the delay prediction model under the edge end is recorded as f (x, S)edge) The time delay prediction model under the cloud is marked as f (x, S)cloud) Then, the calculated delay T of the edge end when the division point is Ke(K) And cloud endCalculating the time delay Tc(K) Can be respectively recorded as:
Figure BDA0003505452750000126
wherein lpIs the type of the basic neural network layer corresponding to the p-th neural network layer, x (l)p) Type l representing the basic neural network layer corresponding to the p-th neural network layerpAnd the corresponding characteristic vector, p is the sequence number of the basic neural network layer, and K is the segmentation point.
Simultaneously, the data volume output by the Kth neural network layer is recorded as DKAnd the network bandwidth between the edge end and the cloud end is B, so that the integral time delay T (K) when the division point is K can be obtained. The overall delay T (K) for a division point K is:
Figure BDA0003505452750000127
wherein D isKThe data volume output for the Kth neural network layer, B is the network bandwidth between the edge end and the cloud end, and Te(K) The calculated time delay of the edge end when the division point is K, Tc(K) The calculation time delay of the cloud when the division point is K is shown.
The segmentation points K are taken to be 1 to l-1, because for a neural network model of layer l, there may be a total of l-1 segmentation points.
That is, the determination of the optimal segmentation point, i.e., the optimization problem:
argmin{(K,T(K)):K=1,2,…,l-1}。
the optimization problem can be solved by a global search mode, that is, the minimum value of the overall time delays T (K) when the division point K takes 1 to l-1 can be determined by traversing all the overall time delays T (K) when the division point K takes 1 to l-1, and the division point K corresponding to the minimum value of the overall time delays T (K) is selected from the minimum values.
However, since the sub-neural network needs to sum repeatedly, there is an additional time overhead, and the time complexity of the search is finally o (n)2). Therefore, in order to further optimize the processing speed of the algorithm, in this embodiment, the step S22 introduces a prefix and an array for optimization based on the global search, and finally reduces the time complexity to o (n)2)。
Therefore, in the step S2, the neural network segmentation algorithm finally designed is based on the prefix and the global search of the array.
The step S2 specifically includes:
step S21: for each neural network layer of the neural network to be deployed, respectively determining the calculation time delay f (x (l) of the neural network layer at the edge endp),Sedge) And the calculation time delay f (x (l)) of the neural network layer in the cloudp),Scloud);lpIs the type of the basic neural network layer corresponding to the p-th neural network layer; p is the sequence number of the neural network layer, and p is 1 … l;
step S22: for each neural network layer of a neural network to be deployed, respectively determining a prefix and an array prefix [ p ] of the neural network layer at an edge end and a prefix and an array prefix [ p ] of the neural network layer at a cloud end;
wherein, prefix edge [ p]=prefixEdge[p-1]+f(x(lp),Sedge),
prefixCloud[p]=prefixCloud[p-1]+f(x(lp),Scloud)。
prefixEdge and prefixCloud represent the prefix and array of the edge and cloud, respectively.
The prefix and the array are defined for optimizing the summation time in the algorithm, and specific examples are as follows:
assuming that 4 layers of neural networks are provided, the calculation time delays obtained through f (x) calculation are respectively as follows:
T=[2,3,1,4]
its prefix and array are defined by the following recursion formula
prefixT[i]=T[i]+prefixT[i-1]
According to the recurrence formula, the following can be obtained:
prefixT=[0,2,5,6,9]
the advantage of this is that converting the sum into the difference is an optimization means and has no special meaning.
Step S23: for each division point K, determining the calculation time delay T of the edge end when the division point is Ke(K) Computing time delay T of cloud when division point is Kc(K) And the transmission delay when the division point is K, and summing the three to obtain the integral delay T (K) when the division point is K;
wherein, the calculation time delay T of the edge end when the division point is Ke(K) (i.e. T)edge) Comprises the following steps:
Te(K)=prefixEdge[K]-prefixEdge[0];
computing time delay T of cloud when division point is Kc(K) (i.e. T)cloud) Comprises the following steps:
Tc(K)=prefixEdge[l]-prefixEdge[K];
propagation delay with a division point of K (i.e. T)comm) Is composed of
Figure BDA0003505452750000141
Wherein D isKAnd B is the network bandwidth between the edge end and the cloud end.
Step S24: traversing the division points K in the range from 1 to l-1, and taking the division point K corresponding to the minimum value in the whole time delay T (K) as the optimal division point Kbest
That is, the input parameters of the neural network segmentation algorithm include:
{lp|p=1…l}:lpis the type of the basic neural network layer corresponding to the p-th neural network layer;
{DK|K=1…N}:DKis the data volume output by the Kth neural network layer;
f (x, S): a delay prediction model;
b: network bandwidth between the edge and the cloud; and
Tbest: the current optimal overall delay.
The output parameters of the neural network segmentation algorithm comprise the optimalDivision point Kbest
Part of the code is as follows:
1.forpin range 1…l:
2.prefixEdge[p]=prefixEdge[p-1]+f(x(lp),Sedge)
3.prefixCloud[p]=prefixCloud[p-1]+f(x(lp),Scloud)
4.for K in range 1…l-1:
5.Tedge=prefixEdge[K]-prefixEdge[0]
6.Tcloud=prefixEdge[l]-prefixEdge[K]
7.
Figure BDA0003505452750000151
8.if Tedge+Tcolud+Tcomm<Tbest:
9.Tbest=Tedge+Tcolud+Tcomm
10.Kbest=K
11.return Kbest
step S3: the trained neural network to be deployed is divided into a first sub-neural network and a second sub-neural network by using the optimal dividing point, the first sub-neural network is deployed on the equipment at the edge end, and the second sub-neural network is deployed on the cloud server for performing collaborative computing.
The sub-neural network before the division point is a first sub-neural network and is used for being deployed at the edge end; the sub-neural network behind the division point is a second sub-neural network and is used for being deployed at the cloud.
Referring to fig. 2, for the overall collaborative flow of step S3, for the segmented and deployed neural network model, a user initiates a request to the edge-side device to transmit request data, where the request data is first subjected to the operation of a first sub-neural network on the edge-side device, and then the intermediate computation result is sent to the cloud server to perform the operation of a second sub-neural network, so as to sequentially implement edge-cloud collaborative computation.
In practice, the edge device may be a base station, a router, an edge gateway, or other edge device near the user. When the terminal equipment of the user sends a request, the request data firstly passes through the equipment of the edge end, the equipment of the edge end carries out primary processing on the request data through the sub-network, and the primary processing result is transmitted to the cloud end for subsequent operation.
In summary, the edge cloud collaborative computing process can be summarized as follows:
1. acquiring the hierarchical information of the model for the neural network to be deployed; and establishing a time delay prediction model according to the level information of the model. If the predictive model has been previously established, step 2 is performed directly.
2. And obtaining an optimal segmentation point through a neural network segmentation algorithm according to the time delay prediction model, wherein the optimal segmentation point is used for segmenting the neural network into a first sub-neural network and a second sub-neural network at the segmentation point.
3. And deploying the first sub-neural network to the edge equipment requested by the user service, deploying the second sub-neural network to the cloud server, and finally performing cooperative computing.
Results of the experiment
We used LeNet5, AlexNet and VGG networks to validate the neural network edge-cloud collaborative computing partition deployment approach that implements the present invention.
Firstly, aiming at the layer type of each neural network layer of the neural network adopted by the user, the characteristic variables are selected according to the following table and used for establishing a time delay prediction model, and the characteristic variables of the layer type can be selected according to the following table:
Figure BDA0003505452750000161
and then, determining the neural network segmentation point by using a time delay prediction model. The determination of the segmentation points depends on the network bandwidth of the edge end and the cloud end, and when the network bandwidth is poor, the segmentation points tend to be segmented in the layer behind the neural network, so that most of the neural network layer is placed at the edge end for operation as far as possible, and the network overhead is reduced. When the network bandwidth is good, the partitioning point will tend to partition at the previous layer to make the best possible use of the computational resources. Fig. 3 shows a schematic diagram of selecting the segmentation points of the LeNet5, AlexNet and VGG networks by the neural network edge-cloud collaborative computing segmentation deployment method of the present invention under the condition that the network bandwidth is 500KB, and it can be seen that the segmentation points are located on the neural network layer with small output data amount and are as far forward as possible, which is in line with expectations.
Fig. 4 is a comparison graph of performance improvement based on global search and prefix and global search in step S2 of the neural network edge-cloud collaborative computing segmentation deployment method of the present invention, where the left sub-graph is theoretically a change in algorithm complexity with expansion of problem scale, and the right sub-graph is actually a time consumed by the algorithm with increase in the number of processed neural network layers.
It can be seen that the prefix and array optimized global search algorithm can bring better processing efficiency, and the actually brought promotion is consistent with the theoretically brought promotion.
After the segmentation point is determined, the neural network before the segmentation point is finally deployed on the edge device, and the neural network after the segmentation point is deployed on the cloud. In order to visually observe the improvement of edge-cloud cooperative computing in comparison with cloud computing and edge computing along with the change of network bandwidth, the range of the bandwidth between the edge and the cloud is set to be 0.1-10MB, the task response time in three computing modes is counted, and the result can be seen in fig. 5.
It can be seen that when the network bandwidth is poor, cloud computing is limited by communication delay, which causes too slow response speed, and thus is far inferior to edge computing and edge cloud cooperative computing, whereas edge computing causes large computation delay when the computational neural network is complex because of limited computing power of edge devices, which is particularly obvious in the VGG network. Summarizing the results in fig. 5, it can be seen that the edge-cloud collaborative computation is a computing mode that can take advantages of both edge computing and cloud computing, and can bring faster response speed to the neural network application under the condition of limited network bandwidth.
The neural network edge-cloud collaborative computing segmentation deployment method is used as a compromise scheme, and can give consideration to both computing delay and communication delay. In the computing aspect, part of the neural network layer can be unloaded to the cloud server for computing, so that the computing load of the edge end is reduced, and the computing speed is increased. In the communication layer, the data quantity needing to be transmitted in the network is reduced by selecting a proper neural network partition point, so that the burden of the network is relieved. By the mode, the neural network can be flexibly segmented according to the network condition and the computing power of the edge device, the neural network is divided into two parts and is respectively deployed at the edge end and the cloud end for computing, compared with the traditional edge computing and cloud computing, the computing mode can bring better response speed, and the proposed segmentation deployment algorithm is more effective.
The above embodiments are merely preferred embodiments of the present invention, which are not intended to limit the scope of the present invention, and various changes may be made in the above embodiments of the present invention. All simple and equivalent changes and modifications made according to the claims and the content of the specification of the present application fall within the scope of the claims of the present patent application. The invention has not been described in detail in order to avoid obscuring the invention.

Claims (10)

1. A neural network edge-cloud collaborative computing segmentation deployment method is characterized by comprising the following steps:
step S1: determining the layer type of a basic neural network layer of a neural network to be deployed, and respectively establishing a time delay prediction model f (x, S) for each computing resource and each basic neural network layer, wherein each time delay prediction model is only used for predicting the computing time delay of one basic neural network layer under one computing resource;
step S2: executing a neural network segmentation algorithm to determine an optimal segmentation point of the neural network to be deployed, wherein the optimal segmentation point is a segmentation point K corresponding to the minimum value in all overall time delays T (K) when the segmentation point K is from 1 to l-1, the segmentation point is used for segmenting the neural network to be deployed into a first sub-neural network and a second sub-neural network, and l is the number of layers of a neural network layer in the neural network to be deployed;
step S3: the trained neural network to be deployed is divided into a first sub-neural network and a second sub-neural network by using the optimal dividing point, the first sub-neural network is deployed on the equipment at the edge end, and the second sub-neural network is deployed on the cloud server for performing collaborative computing.
2. The neural network edge-cloud collaborative computing segmentation deployment method of claim 1, wherein input parameters of the delay prediction model f (x, S) are x and S, x is a feature vector corresponding to a neural network layer, the feature vector has one or more feature variables, S is a type of computing resource, and the type S of computing resource includes an edge end type ScloudAnd cloud type Scloud
3. The neural network edge-cloud collaborative computing segmentation deployment method according to claim 2, wherein in the step S1, a time delay prediction model is established and trained in a regression tree form for all kinds of feature variables corresponding to each basic neural network layer.
4. The neural network edge-cloud collaborative computing partition deployment method according to claim 3, wherein the step S1 includes: the following steps are executed for each computing resource and each basic neural network layer:
step S10: acquiring all characteristic variables of a basic neural network layer and a sample of the calculation time delay of the neural network layer under a calculation resource to form a sample data set D of a time delay prediction model;
step S11: for one of the characteristic variables xiSetting a current segmentation point s to enable a sample data set D of the time delay prediction model to be in the characteristic variable xiIs divided into two sub-sample spaces, namely a first sub-sample space and a second sub-sample space;
step S12: determining a current segmentationOutput predicted value y of sample corresponding to point spredictAnd a prediction error e;
step S13: the current segmentation point S is continuously updated and the above-described step S12 is repeated to determine the segmentation point S that minimizes the value of the prediction error e as the optimal segmentation point.
5. The neural network edge-cloud collaborative computing partition deployment method according to claim 4, wherein in the step S11, the first subsample space R1(s)={x|xiS ≦ s }, second subsample space R2(s)={x|xi>s},xiIs the ith characteristic variable;
in the step S12, the prediction output value y of the time delay prediction modelpredictIs as follows;
Figure FDA0003505452740000021
Figure FDA0003505452740000022
in the formula, xkIs a characteristic variable of the kth sample, R1Is a first subsample space, R2Is a second sub-sample space, c1For the predicted output value of the time-delay prediction model in the first subsample space, c2Predicting an output value of the time delay prediction model in a second sub-sample space; n is a radical of1Is the number of samples of the first subsample space, N2Is the number of samples of the second sub-sample space;
the prediction error e corresponding to the current segmentation point s is as follows:
Figure FDA0003505452740000023
wherein, ykIs the calculated time delay of the kth sample.
6. The neural network edge-cloud collaborative computing segmentation deployment method according to claim 4, wherein the feature vector corresponding to the basic neural network layer has a plurality of feature variables;
the step S1 further includes:
step S14: repeating the steps S11 to S13 on another characteristic variable until the optimal segmentation points corresponding to all the characteristic variables are determined;
step S15: and determining the prediction output value of the time delay prediction model in each sample region according to each sample region divided by the optimal dividing point.
7. The neural network edge-cloud collaborative computing partition deployment method according to claim 4, wherein in the step S15, the prediction output value of the time delay prediction model f (x, S) is:
Figure FDA0003505452740000024
wherein,
Figure FDA0003505452740000025
wherein x is a feature vector corresponding to the neural network layer, RmIs the m-th sample region, m is the number of the sample region, n represents the total number of the kinds of the characteristic variables, cmAnd predicting the output value of the time delay prediction model in the mth sample region.
8. The neural network edge-cloud collaborative computing partition deployment method according to claim 1, wherein in step S2, the overall time delay T (K) when the partition point is K is determined based on the first sub-neural network and the second sub-neural network partitioned when the partition point is K, and the computing time delay T of the first sub-neural network at the edge end is determinede(K) And the calculation time delay T of the second sub-neural network at the cloudc(K) And combining the data volume output by the Kth neural network layerDKAnd the network bandwidth B between the edge end and the cloud end.
9. The neural network edge-cloud collaborative computing partition deployment method according to claim 1, wherein in the step S2, the neural network partition algorithm is based on a global search of prefixes and arrays;
the step S2 includes:
step S21: for each neural network layer of the neural network to be deployed, respectively determining the calculation time delay f (x (l) of the neural network layer at the edge endp),Sedge) And the calculation time delay f (x (l)) of the neural network layer at the cloud endp),Scloud);lpIs the type of the basic neural network layer corresponding to the p-th neural network layer; p is the sequence number of the neural network layer, and p is 1.. l;
step S22: for each neural network layer of a neural network to be deployed, respectively determining a prefix and an array prefix [ p ] of the neural network layer at an edge end and a prefix and an array prefix [ p ] of the neural network layer at a cloud end;
step S23: for each division point K, determining the calculation time delay T of the edge end when the division point is Ke(K) Computing time delay T of cloud when division point is Kc(K) And the transmission delay when the division point is K, and summing the three to obtain the integral delay T (K) when the division point is K;
step S24: traversing the division points K in the range from 1 to l-1, and taking the division point K corresponding to the minimum value in the whole time delay T (K) as the optimal division point Kbest
10. The neural network edge-cloud collaborative computing segmentation deployment method of claim 9, wherein prefixEdge [ p ]]=prefixEdge[p-1]+f(x(lp),Sedge);
prefixCloud[p]=prefixCloud[p-1]+f(x(lp),Scloud);
Te(K)=prefixEdge[K]-prefixEdge[0];
Tc(K)=prefixEdge[l]-prefixEdge[K];
The transmission delay at the division point K is
Figure FDA0003505452740000031
Wherein D isKAnd B is the network bandwidth between the edge end and the cloud end.
CN202210137345.6A 2022-02-15 2022-02-15 Neural network edge-cloud collaborative computing segmentation deployment method Pending CN114528987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210137345.6A CN114528987A (en) 2022-02-15 2022-02-15 Neural network edge-cloud collaborative computing segmentation deployment method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210137345.6A CN114528987A (en) 2022-02-15 2022-02-15 Neural network edge-cloud collaborative computing segmentation deployment method

Publications (1)

Publication Number Publication Date
CN114528987A true CN114528987A (en) 2022-05-24

Family

ID=81622020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210137345.6A Pending CN114528987A (en) 2022-02-15 2022-02-15 Neural network edge-cloud collaborative computing segmentation deployment method

Country Status (1)

Country Link
CN (1) CN114528987A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114663A (en) * 2022-07-01 2022-09-27 中铁第四勘察设计院集团有限公司 Face recognition method based on cloud edge-end cooperation
CN115569338A (en) * 2022-11-09 2023-01-06 国网安徽省电力有限公司蚌埠供电公司 Multi-class fire early warning data distributed training and local fire extinguishing method and system
WO2024036579A1 (en) * 2022-08-18 2024-02-22 Oppo广东移动通信有限公司 Wireless communication method and device
WO2024099313A1 (en) * 2022-11-08 2024-05-16 华南理工大学 Cloud-edge-end collaborative intelligent infant care system and method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114663A (en) * 2022-07-01 2022-09-27 中铁第四勘察设计院集团有限公司 Face recognition method based on cloud edge-end cooperation
WO2024036579A1 (en) * 2022-08-18 2024-02-22 Oppo广东移动通信有限公司 Wireless communication method and device
WO2024099313A1 (en) * 2022-11-08 2024-05-16 华南理工大学 Cloud-edge-end collaborative intelligent infant care system and method
CN115569338A (en) * 2022-11-09 2023-01-06 国网安徽省电力有限公司蚌埠供电公司 Multi-class fire early warning data distributed training and local fire extinguishing method and system
CN115569338B (en) * 2022-11-09 2023-10-13 国网安徽省电力有限公司蚌埠供电公司 Multi-category fire early warning data distributed training and local extinguishing method and system

Similar Documents

Publication Publication Date Title
CN114528987A (en) Neural network edge-cloud collaborative computing segmentation deployment method
CN111242282B (en) Deep learning model training acceleration method based on end edge cloud cooperation
CN109948029B (en) Neural network self-adaptive depth Hash image searching method
CN109978142B (en) Neural network model compression method and device
CN110460880B (en) Industrial wireless streaming media self-adaptive transmission method based on particle swarm and neural network
CN113242568A (en) Task unloading and resource allocation method in uncertain network environment
CN112817653A (en) Cloud-side-based federated learning calculation unloading computing system and method
WO2021036414A1 (en) Co-channel interference prediction method for satellite-to-ground downlink under low earth orbit satellite constellation
CN113613301B (en) Air-ground integrated network intelligent switching method based on DQN
CN113038612B (en) Cognitive radio power control method based on deep learning
CN111355633A (en) Mobile phone internet traffic prediction method in competition venue based on PSO-DELM algorithm
CN113961204A (en) Vehicle networking computing unloading method and system based on multi-target reinforcement learning
CN112836822A (en) Federal learning strategy optimization method and device based on width learning
CN115967990A (en) Classification and prediction-based border collaborative service unloading method
CN116360883A (en) Combined optimization method for unloading of Internet of vehicles computing tasks
CN112561050A (en) Neural network model training method and device
Zhao et al. C-LSTM: CNN and LSTM Based Offloading Prediction Model in Mobile Edge Computing (MEC)
CN116663644A (en) Multi-compression version Yun Bianduan DNN collaborative reasoning acceleration method
CN117436485A (en) Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision
CN115730646A (en) Hybrid expert network optimization method based on partial quantization
CN113743012B (en) Cloud-edge collaborative mode task unloading optimization method under multi-user scene
CN112910716B (en) Mobile fog calculation loss joint optimization system and method based on distributed DNN
CN109581280A (en) The adaptive tuning on-line method, system and device of terminal
CN113033653B (en) Edge-cloud cooperative deep neural network model training method
CN114860345B (en) Calculation unloading method based on cache assistance in smart home scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination