CN114970824A - Edge cloud collaborative convolution neural network reasoning method and system - Google Patents

Edge cloud collaborative convolution neural network reasoning method and system Download PDF

Info

Publication number
CN114970824A
CN114970824A CN202210611122.9A CN202210611122A CN114970824A CN 114970824 A CN114970824 A CN 114970824A CN 202210611122 A CN202210611122 A CN 202210611122A CN 114970824 A CN114970824 A CN 114970824A
Authority
CN
China
Prior art keywords
model
compression
given
scheme
precision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210611122.9A
Other languages
Chinese (zh)
Other versions
CN114970824B (en
Inventor
杨树森
段亚璐
赵聪
赵鹏
张展华
郭思言
栗海亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Cumulus Technology Co ltd
Original Assignee
Hangzhou Cumulus Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Cumulus Technology Co ltd filed Critical Hangzhou Cumulus Technology Co ltd
Priority to CN202210611122.9A priority Critical patent/CN114970824B/en
Priority claimed from CN202210611122.9A external-priority patent/CN114970824B/en
Publication of CN114970824A publication Critical patent/CN114970824A/en
Application granted granted Critical
Publication of CN114970824B publication Critical patent/CN114970824B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

A method and a system for reasoning on edge cloud collaborative convolution neural network include: based on the constructed model compression method, obtaining the time delay of the model under all compression division schemes; determining the performance upper and lower bounds of the combined compression division scheme based on the obtained time delay of all the compression division schemes; constructing a model precision upper bound estimation method under a given compression ratio on a given CNN division layer; constructing a compression rate decision method when the precision requirement and CNN are given to divide layers; searching a combined optimal model compression division scheme with optimal time delay; and operating the system to perform model reasoning. The invention carries out hierarchical computation unloading through the compression division of the CNN model, carries out joint optimization on communication and computation bottlenecks, realizes rapid intelligent analysis of mass terminal data, reliably, controllably and efficiently compresses the communication traffic of the CNN model on any given layer through an congruent channel pruning method and a uniform affine quantization method, and obviously reduces the transmission delay of end edge cloud collaborative CNN reasoning.

Description

Edge cloud collaborative convolution neural network reasoning method and system
Technical Field
The invention belongs to the field of distributed intelligence, and particularly relates to a terminal edge cloud collaborative convolution neural network reasoning method and system.
Background
With the development of highly intelligent deep learning algorithm and widely applied internet of things technology, a large number of intelligent applications (such as traffic monitoring, defect detection, power grid inspection and the like) rely on the use of a deep learning Convolutional Neural Network (Convolutional Neural Network) model for reasoning to perform high-precision and rapid intelligent analysis on mass terminal data. The existing method promotes high-precision intelligent analysis by designing and optimizing deep learning CNN reasoning, and even obtains the effect exceeding human on part of visual tasks. However, high-precision intelligent analysis based on CNN inference is often accompanied by high computational overhead, and it is difficult to implement rapid intelligent analysis directly in terminal deployment with limited computational resources, which hinders the landing of a large number of practical applications. Therefore, how to realize high-precision and fast intelligent analysis of terminal data under the condition of actual equipment resources is a key problem for supporting intelligent application. The high computational overhead of the high-precision deep learning CNN inference limits the rapid intelligent analysis of the CNN inference to be completed on equipment with limited general computing resources. In order to eliminate the computing bottleneck brought by CNN inference, the existing applications often upload data to complete intelligent analysis by using computationally scalable cloud computing. However, considering the volume of massive terminal data, this method cannot support a large amount of intelligent applications in practical bandwidth resources. The existing terminal computing and cloud computing modes are respectively limited by computing and communication, and high-precision and quick intelligent analysis of mass terminal data cannot be supported.
Disclosure of Invention
The invention aims to provide a method and a system for reasoning a terminal edge cloud collaborative convolution neural network, which aim to solve the problems that the existing mode can not support a large amount of intelligent applications in actual bandwidth resources, and the existing terminal computing and cloud computing modes are respectively limited by computing and communication and can not support high-precision and rapid intelligent analysis of mass terminal data.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for reasoning edge cloud collaborative convolution neural network comprises the following steps:
constructing a communication optimal model compression method, and compressing the communication traffic of the CNN model on any given layer through congruent channel pruning and uniform affine quantization;
based on the constructed model compression method, information collection is carried out on a given CNN model in a given end edge cloud system, and the time delay of the model under all compression division schemes is obtained;
determining an upper bound (T) for performance of a joint compression partitioning scheme based on the obtained latencies of all compression partitioning schemes max ,A max ) And lower bound (T) min ,A min ) Wherein, T max 、T min To infer the upper and lower bounds of the delay, A max 、A min To infer the upper and lower bounds of precision, (T) max ,A max ) Determined by the scheme with minimal delay when no compression is present, (T) min ,A min ) Determined by the scheme with the minimum compression time delay;
constructing a model precision upper bound estimation method under a given compression ratio on a given CNN division layer;
constructing a compression rate decision method when the precision requirement and CNN are given to divide layers;
at a given accuracy requirement A 0 And searching a joint optimal model compression division scheme with optimal time delay based on a model precision upper bound estimation method and a compression rate decision method, wherein if the given precision is greater than an upper bound A max Directly providing an upper bound scheme; if the given precision is less than the lower bound A min Directly providing a lower bound scheme; the rest of the cases, based on a given accuracy requirement A 0 Search time delay optimized joint model compression partitioning scheme (l) * ,r * ) To transportOptimal end-to-end reasoning delay T of model based on scheme optimization *
At a given delay requirement T 0 And searching a combined optimal model compression division scheme with optimal precision based on a model precision upper bound estimation method and a compression rate decision method, and if the given delay is greater than an upper bound T max Directly providing an upper bound scheme; if the given delay is less than the lower bound T min Directly providing a lower bound scheme; otherwise, based on given delay requirement T 0 Search precision optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal inference precision A of the model based on the scheme optimization *
Output-based joint optimal model compression partitioning scheme (l) * ,r * ) And optimizing the model, deploying the model in the end edge cloud system, operating the system and performing model reasoning.
Further, the compression method for constructing the communication optimal model comprises the following steps:
step 1.1, congruent channel pruning, for a given CNN layer, solving
Figure BDA0003673170230000031
s.t.||β|| 0 ≤K′,
Figure BDA0003673170230000032
Pruning insignificant convolution kernels, where | · |) F Representing Frobenius norm, S, L, K and K' representing the number of test samples, the number of branches requiring deletion of convolution kernels simultaneously, the number of deleted convolution kernels and the number of remaining convolution kernels respectively, Y representing a characteristic diagram of the output of the current convolution kernel, X representing the characteristic diagram of the current convolution kernel, and k representing the channel corresponding to the kth input profile, W l,k The kth column representing the l convolution kernel, β is a k-dimensional vector, the value of each dimension represents the importance of a convolution kernel, λ 1 Is a penalty coefficient; first, W is fixed l,k Increase of λ 1 Calculating beta, deleting the minimum value in the beta vector corresponding to the current beta vector and the convolution kernel corresponding to the minimum value, fixing the beta with the minimum element deleted, and updating W through training l,k Repeating the iteration until the number of the components in the beta is less than K';
step 1.2, unifying affine quantization, and performing affine quantization on the given CNN layer output compressed in the step 1.1 to 8-bit.
Further, the specific operation of the model under all compression partitioning schemes is obtained as follows: information collection is carried out on a given CNN model in a given end edge cloud system to obtain time delay of all compression division schemes, one N-layer CNN model is deployed in a 3-layer end edge cloud system, and l is set in a division layer (l ═ l) 1 ,l 2 ) Compression setting r ═ r (r) 1 ,r 2 ) Wherein 0 to l 1 Layer CNN model runs on the end equipment, l 1 Compression ratio of layer CNN model is r 1 ,l 1 +1 to l 2 Layer CNN model runs on edge equipment, l 2 Compression ratio of layer CNN model is r 2 ,l 2 The +1 to N layers of CNN models run on the cloud equipment, corresponding compression rates are achieved based on the compression models, and under the compression division scheme (l, r), end-to-end delay of CNN reasoning
Figure BDA0003673170230000033
Wherein l 0 ≡0,l 3 ≡N,T c Is the sum of the computation delays, T, on all devices of the edge cloud t Is the sum of the communication delays between the end edge clouds.
Further, the specific operation of constructing the model accuracy upper bound estimation method at a given compression rate on a given CNN division layer is as follows:
model accuracy upper bound estimation at a given compression ratio on a given CNN partition level, for a given partition level l g Compressibility r-precision A function (l) g R) has a monotonically concave nature, at a given division level and compression ratio (l) g ,r g ) Based on two existing compression ratio-precision data points ((l) g ,r 1 ),A 1 ) And ((l) g ,r 2 ),A 2 ),r 1 ≤r 2 <r g Or r g <r 1 ≤r 2 Estimate scheme (l) g ,r g ) Upper bound of accuracy of
Figure BDA0003673170230000041
Selected existing data (l) from g ,r g ) The closer, the more accurate the estimate, so the distance (l) is chosen among the existing data g ,r g ) Two points with the smallest sum of distances.
Further, a compression rate decision method is constructed when the accuracy requirement and CNN are given to divide layers:
given CNN layer l with compression g post-CNN model precision A-compressibility R function R (l) g The single concave nature of A), quickly determining that the accuracy requirement A is met g When l is turned on g The highest compression ratio CRD (A) g |l g )=R * (l g ,A g );
The method comprises the following steps:
step 5.1, based on the existing sum (l) g ,A g ) Two data points with the smallest sum of distances ((l) g ,A 1 ),r 1 ) And ((l) g ,A 2 ),r 2 ) An estimated value r' based on the calculated compressibility;
step 5.2 data ((l) obtained by actual model compression g ,A′),r′);
Step 5.3, repeating steps 5.1 and 5.2 until R' no longer increases, the maximum compression ratio R * (l g ,A g ) If the estimated value of r 'is out of range in the loop iteration, a new r' is determined using dichotomy in the feasible value range.
Further, at a given accuracy requirement A 0 Then, based on a model precision upper bound estimation method and a compression ratio decision method, searching a joint optimal model compression division scheme with optimal delay specifically comprises:
the method comprises the steps of compressing a partition scheme search algorithm through a joint optimal model, dynamically compressing a scheme search space, and determining that the given precision requirement A is met 0 Model optimization scheme for optimization of down-time * ,r * )。
Further, the method comprises the following steps:
step 6.1, setting local optimal delay T * =T max Let l 1 ←1;
Step 6.2, set l 2 ←l 1
Step 6.3, set scheme l as (l) 1 ,l 2 ) Based on local optimum delay T * Reducing a solution search space
Figure BDA0003673170230000042
Wherein
Figure BDA0003673170230000043
Is 1 1 、l 2 A set of layer selectable compression ratios;
step 6.4, based on R, let l 1 Candidate compression ratio
Figure BDA0003673170230000051
If the scheme is
Figure BDA0003673170230000052
Accuracy of model
Figure BDA0003673170230000053
Update l 1 Candidate compression ratio
Figure BDA0003673170230000054
Updating
Figure BDA0003673170230000055
Step 6.5, if
Figure BDA0003673170230000056
Let l 2 Candidate compression ratio
Figure BDA0003673170230000057
Updating
Figure BDA0003673170230000058
Based on step 4, if
Figure BDA0003673170230000059
Figure BDA00036731702300000510
And
Figure BDA00036731702300000511
are all greater than or equal to A 0 Wherein
Figure BDA00036731702300000512
Is a scheme
Figure BDA00036731702300000513
Estimating and updating the upper bound of model precision
Figure BDA00036731702300000514
Combined model compression division scheme with optimal time delay
Figure BDA00036731702300000515
Setting an optimal delay T * ←T(l * ,r * );
Step 6.6, update
Figure BDA00036731702300000516
Step 6.7, repeating the steps 6.5 and 6.6 until the step
Figure BDA00036731702300000517
Step 6.8, update l 2 ←l 2 +1;
Step 6.9, if l 2 N-1 or less, repeating the step 6.3 to 6.8;
step 6.10, update l 1 ←l 1 +1;
Step 6.11, if 1 N-1 or less, repeating the step 6.2 to 6.10;
step 6.12, output (l) * ,r * ) And T *
Further, T is required at a given delay 0 Next, based on a model precision upper bound estimation method and a compression ratio decision method, searching for a joint optimal model compression partitioning scheme with optimal precision specifically includes:
searching the space by a dynamic compression scheme, determining the time when a given delay requirement T is met 0 Model optimization scheme with optimal lower precision (l) * ,r * )。
Further, the method comprises the following steps:
step 7.1, setting local optimal precision A * =A max Let l 1 ←1;
Step 7.2, set l 2 ←l 1
Step 7.3, put scheme l ═ l (l) 1 ,l 2 ) Based on local optimum accuracy A * Reducing a solution search space
Figure BDA00036731702300000518
Wherein
Figure BDA00036731702300000519
Is 1 1 、l 2 A set of layer selectable compression ratios;
step 7.4, based on R, let l 1 Candidate compression ratio
Figure BDA00036731702300000520
Updating
Figure BDA0003673170230000061
Step 7.5, if
Figure BDA0003673170230000062
Let l 2 Candidate compression ratio
Figure BDA0003673170230000063
Updating
Figure BDA0003673170230000064
If it is
Figure BDA0003673170230000065
Figure BDA0003673170230000066
And
Figure BDA0003673170230000067
are all greater than A * Setting the optimal compression division scheme of the combined model
Figure BDA0003673170230000068
Setting the optimum precision A * ←A(l * ,r * );
Step 7.6, based on step 4, update
Figure BDA0003673170230000069
7.7, repeating the steps 7.5 and 7.6 until
Figure BDA00036731702300000610
Step 7.8, update l 2 ←l 2 +1;
Step 7.9, if l 2 N-1 or less, repeating the step 7.3 to 7.8;
step 7.10, update l 1 ←l 1 +1;
Step 7.11, if 1 N-1 or less, repeating the step 7.2 to 7.10;
step 7.12, output (l) * ,r * ) And A *
Further, an edge cloud convolution neural network inference system based on joint compression partitioning comprises:
the model compression method construction module is used for constructing a communication optimal model compression method and compressing the communication traffic of the CNN model on any given layer through congruent channel pruning and uniform affine quantization;
the model delay obtaining module is used for carrying out information collection on a given CNN model in a given end edge cloud system based on a constructed model compression method to obtain the delay of the model under all compression division schemes;
a performance upper and lower bound determining module for determining the performance upper bound (T) of the combined compression division scheme based on the obtained time delay of all compression division schemes max ,A max ) And lower bound (T) min ,A min ) Wherein, T max 、T min To infer the upper and lower bounds of the delay, A max 、A min To infer the upper and lower bounds of precision, (T) max ,A max ) Determined by the scheme with minimal delay when no compression is present, (T) min ,A min ) Determined by the scheme with the minimum compression time delay;
the estimation method building module is used for building a model precision upper bound estimation method under a given compression ratio on a given CNN division layer;
the decision method construction module is used for constructing a compression rate decision method when the precision requirement and CNN are given to divide layers;
an optimal model compression partitioning scheme obtaining module for obtaining the optimal model compression partitioning scheme at a given precision requirement A 0 And searching a joint optimal model compression division scheme with optimal time delay based on a model precision upper bound estimation method and a compression rate decision method, wherein if the given precision is greater than an upper bound A max Directly providing an upper bound scheme; if the given precision is less than the lower bound A min Directly providing a lower bound scheme; the rest of the cases, based on a given accuracy requirement A 0 Search time delay optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal end-to-end reasoning delay T based on the model optimized by the scheme * (ii) a At a given delay requirement T 0 And searching a combined optimal model compression division scheme with optimal precision based on a model precision upper bound estimation method and a compression rate decision method, and if the given delay is greater than an upper bound T max Directly providing an upper bound scheme; if the given delay is less than the lower bound T min Directly providing a lower bound scheme; otherwise, based on given delay requirement T 0 Search precision optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal inference precision A of the model based on the scheme optimization *
An output module for an output-based joint optimal model compression partitioning scheme (l) * ,r * ) And optimizing the model, deploying the model in the end edge cloud system, operating the system and performing model reasoning.
Compared with the prior art, the invention has the following technical effects:
according to the method, a terminal-edge-cloud computing architecture is adopted, hierarchical computing unloading is carried out through compression and division of the CNN model, communication and computing bottlenecks are jointly optimized, rapid intelligent analysis of mass terminal data is achieved under the condition that accuracy requirements are guaranteed, communication traffic of the CNN model on any given layer is reliably, controllably and efficiently compressed through an congruent channel pruning method and a uniform affine quantization method, and transmission delay of end edge cloud collaborative CNN reasoning is remarkably reduced.
Further, by utilizing the monotone concave property of precision-compression ratio/precision-reasoning delay time of the CNN model after the given CNN layer is compressed, an optimal combined model compression division scheme is quickly determined through a precision upper bound estimation method and a compression ratio decision method, and the calculation overhead of CNN model optimization is remarkably reduced.
Further, based on an upper-bound precision estimation method and a compression rate decision method, under the condition of a given inference precision or delay requirement, a combined optimal model compression partitioning scheme search algorithm is used for dynamically compressing a scheme search space, a model optimization scheme with the shortest delay time meeting the given precision requirement or a model optimization scheme with the highest precision meeting the given delay requirement is efficiently determined, and low-delay and high-precision collaborative inference of a given CNN model in a given edge cloud system is supported.
Drawings
FIG. 1 is a schematic representation of an implementation of the process herein;
FIG. 2 is a logic flow diagram of the method herein;
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
Referring to fig. 1, the present invention provides an edge cloud collaborative CNN inference method based on joint compression partitioning, including the following steps:
step 1, communication optimal model compression, namely reducing communication traffic between CNN intermediate layers by a CNN model compression method, deleting unimportant convolution kernels in a given CNN layer by adopting equal Channel Pruning (identify Channel Pruning) to reduce the number of transmitted feature graphs, and reducing the number of bits of data on the feature graphs output by the given CNN layer by adopting Uniform Affine Quantization (Uniform affinity Quantization), wherein the step 1 comprises the following steps:
step 1.1, congruent channel pruning, for a given CNN layer, solving
Figure BDA0003673170230000081
s.t.||β|| 0 ≤K′,
Figure BDA0003673170230000082
Pruning insignificant convolution kernels, where | · |) F Representing Frobenius norm, S, L, K and K' representing the number of test samples, the number of branches requiring deletion of convolution kernels simultaneously, the number of deleted convolution kernels and the number of remaining convolution kernels respectively, Y representing a characteristic diagram of the output of the current convolution kernel, X representing the characteristic diagram of the current convolution kernel, and k representing the channel corresponding to the kth input profile, W l,k The kth column representing the l convolution kernel, β is a k-dimensional vector, the value of each dimension represents the importance of a convolution kernel, λ 1 Is a penalty factor. First, W is fixed l,k Increase of λ 1 Calculating beta, deleting the minimum value in the beta vector corresponding to the current beta vector and the convolution kernel corresponding to the minimum value, fixing the beta with the minimum element deleted, and updating W through training l,k Repeating the iteration until the number of the components in the beta is less than K';
step 1.2, unifying affine quantization, and performing affine quantization on the output of the given CNN layer compressed in the step 1.1 to 8-bit;
step 2, System information collection (System Profiling), wherein information collection is carried out on a given CNN model in a given end edge cloud SystemObtaining the time delay of all compression division schemes, deploying an N-layer CNN model in a 3-layer end edge cloud system, and setting l-l (l) for the division layers 1 ,l 2 ) Compression setting r ═ r (r) 1 ,r 2 ) Wherein 0 to l 1 Layer CNN model runs on the end equipment, l 1 Compression ratio of layer CNN model is r 1 ,l 1 +1 to l 2 Layer CNN model runs on edge equipment, l 2 Compression ratio of layer CNN model is r 2 ,l 2 The +1 to N layers of CNN models run on the cloud equipment, the compression models reach corresponding compression rates based on the step 1, and the CNN reasoning end-to-end time delay is carried out under the compression division scheme (l, r)
Figure BDA0003673170230000091
Wherein l 0 ≡0,l 3 ≡N,T c Is the sum of the computation delays, T, on all devices of the edge cloud t The sum of communication delays between end clouds;
step 3, determining the performance upper bound (T) of the joint compression partitioning scheme max ,A max ) And lower bound (T) min ,A min ) Wherein, T max 、T min To infer the upper and lower bounds of the delay, A max 、A min To infer the upper and lower bounds of precision, (T) max ,A max ) Determined by the scheme with minimal delay when no compression is present, (T) min ,A min ) Determined by the smallest scheme when compression latency is considered;
step 4, estimating the model precision upper Bound (ABE) under a given compression ratio on a given CNN division layer, and for a given division layer l g Compressibility r-precision A function (l) g R) has a monotonically concave nature, at a given division level and compression ratio (l) g ,r g ) Based on two existing compression ratio-precision data points ((l) g ,r 1 ),A 1 ) And ((l) g ,r 2 ),A 2 )(r 1 ≤r 2 <r g Or r g <r 1 ≤r 2 ) Estimate scheme (l) g ,r g ) Upper bound of accuracy of
Figure BDA0003673170230000092
Selected existing data (l) from g ,r g ) The closer, the more accurate the estimate, so the distance (l) is chosen among the existing data g ,r g ) Two points with the minimum distance sum;
step 5 Compression Rate Determination (CRD) at a given precision requirement and CNN partitioning level for a given partitioning level l g Precision A-compressibility R function R (l) g A) has a monotonous concave property by which it is quickly determined that the accuracy requirement A is satisfied g When, given a division level l g The highest compression ratio CRD (A) g |l g )=R * (l g ,A g ) Step 5 comprises the following steps:
step 5.1, based on the existing sum (l) g ,A g ) Two data points with the smallest sum of distances ((l) g ,A 1 ),r 1 ) And ((l) g ,A 2 ),r 2 ) Calculating an estimated value r' of the compressibility based on step 4;
step 5.2 data ((l) obtained by actual model compression g ,A′),r′);
Step 5.3, repeating steps 5.1 and 5.2 until R' no longer increases, the maximum compression ratio R * (l g ,A g ) If the estimated value of r 'is out of range in the loop iteration, determining new r' by using a dichotomy in a feasible value range;
step 6, giving a precision requirement A 0 Searching the combined optimal model compression division scheme with optimal lower time delay and optimal time delay, and if the given precision is greater than the upper bound A max Directly providing an upper bound scheme; the given precision is less than the lower bound A min Directly providing a lower bound scheme; the rest of the cases, based on a given accuracy requirement A 0 Search time delay optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal end-to-end reasoning delay T based on the model optimized by the scheme * Step 6 comprises the following steps:
step 6.1, setting local optimal delay T * =T max Let l 1 ←1;
Step 6.2, set l 2 ←l 1
Step 6.3, set scheme l as (l) 1 ,l 2 ) Based on local optimum delay T * Reducing a solution search space
Figure BDA0003673170230000101
Wherein
Figure BDA0003673170230000102
Is 1 1 、l 2 A set of layer selectable compression ratios;
step 6.4, based on R, let l 1 Candidate compression ratio
Figure BDA0003673170230000103
If the scheme is
Figure BDA0003673170230000104
Accuracy of model
Figure BDA0003673170230000105
Based on step 5, update l 1 Candidate compression ratio
Figure BDA0003673170230000106
Updating
Figure BDA0003673170230000107
Step 6.5, if
Figure BDA0003673170230000108
Let l 2 Candidate compression ratio
Figure BDA0003673170230000109
Updating
Figure BDA00036731702300001010
Based on step 4, if
Figure BDA00036731702300001011
Figure BDA00036731702300001012
And
Figure BDA00036731702300001013
are all greater than or equal to A 0 Wherein
Figure BDA00036731702300001014
Is a scheme
Figure BDA00036731702300001015
Estimating the upper limit of the model precision, and updating based on the step 5
Figure BDA00036731702300001016
Combined model compression division scheme with optimal time delay
Figure BDA00036731702300001017
Setting an optimal delay T * ←T(l * ,r * );
Step 6.6, update
Figure BDA00036731702300001018
Step 6.7, repeating the steps 6.5 and 6.6 until the step
Figure BDA00036731702300001019
Step 6.8, update l 2 ←l 2 +1;
Step 6.9, if l 2 N-1 or less, repeating the step 6.3 to 6.8;
step 6.10, update l 1 ←l 1 +1;
Step 6.11, if 1 N-1 or less, repeating the step 6.2 to 6.10;
step 6.12, output (l) * ,r * ) And T *
Step 7, giving a delay requirement T 0 Searching the combined optimal model compression division scheme with optimal lower precision and optimal precision, and if the given delay is greater than the upper bound T max Directly provideAn upper bound scheme; given delay less than lower bound T min Directly providing a lower bound scheme; otherwise, based on given delay requirement T 0 Search precision optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal inference precision A of the model based on the scheme optimization * Step 7 comprises the following steps:
step 7.1, setting local optimal precision A * =A max Let l 1 ←1;
Step 7.2, set l 2 ←l 1
Step 7.3, put scheme l ═ l (l) 1 ,l 2 ) Based on local optimum accuracy A * Reducing a solution search space
Figure BDA0003673170230000111
Wherein
Figure BDA0003673170230000112
Is 1 1 、l 2 A set of layer selectable compression ratios;
step 7.4, based on R, let l 1 Candidate compression ratio
Figure BDA0003673170230000113
Updating
Figure BDA0003673170230000114
Step 7.5, if
Figure BDA0003673170230000115
Let l 2 Candidate compression ratio
Figure BDA0003673170230000116
Updating
Figure BDA0003673170230000117
Based on step 4, if
Figure BDA0003673170230000118
Figure BDA0003673170230000119
And
Figure BDA00036731702300001110
are all greater than A * Setting the optimal compression division scheme of the combined model
Figure BDA00036731702300001111
Setting the optimum precision A * ←A(l * ,r * );
Step 7.6, update based on step 4
Figure BDA00036731702300001112
7.7, repeating the steps 7.5 and 7.6 until
Figure BDA00036731702300001113
Step 7.8, update l 2 ←l 2 +1;
Step 7.9, if l 2 N-1 or less, repeating the step 7.3 to 7.8;
step 7.10, update l 1 ←l 1 +1;
Step 7.11, if 1 N-1 or less, repeating the step 7.2 to 7.10;
step 7.12, output (l) * ,r * ) And A *
Step 8, compressing and dividing scheme (l) based on the joint optimal model output in step 6 or 7 * ,r * ) Optimizing the model and deploying in the end edge cloud system;
and 9, operating the system to carry out model reasoning.
Referring to fig. 2, the invention provides a cooperative inference method for end edge cloud based on joint model compression partitioning, wherein a logic architecture of the cooperative inference method comprises three parts of system information collection, model optimization, model deployment and inference, and a main body is model optimization. In order to reduce the transmission delay of the end edge cloud collaborative CNN inference, the communication optimal model compression is carried out on the given CNN model; in order to reduce the calculation overhead of CNN model optimization, an optimal combined model compression division scheme is rapidly determined by an upper-bound precision estimation method and a compression rate decision method; in order to support low-delay and high-precision collaborative reasoning of a given CNN model in a given edge cloud system, under the given reasoning precision or delay requirement, a model optimization scheme with the shortest time delay under the condition of meeting the given precision requirement or a model optimization scheme with the highest precision under the condition of meeting the given delay requirement is efficiently determined by a combined optimal model compression partitioning scheme search algorithm.
In another embodiment of the present invention, a joint compression division-based edge cloud convolutional neural network inference system is provided, which can be used to implement the above mentioned edge cloud convolutional neural network inference method based on joint compression division, and specifically, the system includes:
the model compression method construction module is used for constructing a communication optimal model compression method and compressing the communication traffic of the CNN model on any given layer through congruent channel pruning and uniform affine quantization;
the model delay obtaining module is used for carrying out information collection on a given CNN model in a given end edge cloud system based on a constructed model compression method to obtain the delay of the model under all compression division schemes;
a performance upper and lower bound determining module for determining the performance upper bound (T) of the combined compression division scheme based on the obtained time delay of all compression division schemes max ,A max ) And lower bound (T) min ,A min ) Wherein, T max 、T min To infer the upper and lower bounds of the delay, A max 、A min To infer the upper and lower bounds of precision, (T) max ,A max ) Determined by the scheme with the least delay without compression, (T) min ,A min ) The minimum scheme when the time delay is compressed is determined;
the estimation method construction module is used for constructing a model precision upper bound estimation method under a given compression ratio on a given CNN division layer;
the decision method building module is used for building a compression rate decision method when the accuracy requirement and CNN are given to be layered;
an optimal model compression partitioning scheme obtaining module for obtaining the optimal model compression partitioning scheme at a given precision requirement A 0 Then, based on a model precision upper bound estimation method and a compression rate decision method, searching a joint optimal model compression division scheme with optimal time delay, wherein if the given precision is greater than an upper bound A max Directly providing an upper bound scheme; if the given precision is less than the lower bound A min Directly providing a lower bound scheme; the rest of the cases, based on a given accuracy requirement A 0 Search time delay optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal end-to-end reasoning delay T based on the model optimized by the scheme * (ii) a At a given delay requirement T 0 And searching a combined optimal model compression division scheme with optimal precision based on a model precision upper bound estimation method and a compression rate decision method, and if the given delay is greater than an upper bound T max Directly providing an upper bound scheme; if the given delay is less than the lower bound T min Directly providing a lower bound scheme; otherwise, based on given delay requirement T 0 Search precision optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal inference precision A of the model based on the scheme optimization *
An output module for an output-based joint optimal model compression partitioning scheme (l) * ,r * ) And optimizing the model, deploying the model in the end edge cloud system, operating the system and performing model reasoning.
The invention solves the problem that the prior art can not realize the low-delay and high-precision collaborative CNN reasoning in the end edge cloud system. According to the method, a terminal-edge-cloud computing architecture is adopted, hierarchical computing unloading is performed through compression division of a CNN model, communication and computing bottlenecks are optimized in a combined mode, and rapid intelligent analysis of mass terminal data is achieved under the condition that accuracy requirements are guaranteed. The invention can reduce the inference time delay of the CNN model and ensure the inference precision of the CNN model.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. An end edge cloud collaborative convolution neural network reasoning method is characterized by comprising the following steps:
constructing a communication optimal model compression method, and compressing the communication traffic of the CNN model on any given layer through congruent channel pruning and uniform affine quantization;
based on the constructed model compression method, information collection is carried out on a given CNN model in a given end edge cloud system, and the time delay of the model under all compression division schemes is obtained;
determining an upper bound (T) for performance of a joint compression partitioning scheme based on the obtained latencies of all compression partitioning schemes max ,A max ) And lower bound (T) min ,A min ) Wherein, T max 、T min To infer the upper and lower bounds of the delay, A max 、A min To infer the upper and lower bounds of precision, (T) max ,A max ) Determined by the scheme with minimal delay when no compression is present, (T) min ,A min ) Determined by the scheme with the minimum compression time delay;
constructing a model precision upper bound estimation method under a given compression ratio on a given CNN division layer;
constructing a compression rate decision method when the precision requirement and CNN are given to divide layers;
at a given accuracy requirement A 0 Then, based on a model precision upper bound estimation method and a compression rate decision method, searching a joint optimal model compression division scheme with optimal time delay, wherein if the given precision is greater than an upper bound A max Directly providing an upper bound scheme; if the given precision is less than the lower bound A min Directly providing a lower bound scheme; the rest of the cases, based on a given accuracy requirement A 0 Search time delay optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal end-to-end reasoning delay T based on the model optimized by the scheme *
At a given delay requirement T 0 And searching a combined optimal model compression division scheme with optimal precision based on a model precision upper bound estimation method and a compression rate decision method, and if the given delay is greater than an upper bound T max Directly providing an upper bound scheme; if the given delay is less than the lower bound T min Directly providing a lower bound scheme; otherwise, based on given delay requirement T 0 Search precision optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal inference precision A of the model based on the scheme optimization *
Output-based joint optimal model compression partitioning scheme (l) * ,r * ) And optimizing the model, deploying the model in the end edge cloud system, operating the system and performing model reasoning.
2. The method for inference on the basis of the edge cloud convolutional neural network based on joint compression partitioning as claimed in claim 1, wherein the method for constructing the communication optimal model compression comprises the following steps:
step 1.1, congruent channel pruning, for a given CNN layer, solving
Figure FDA0003673170220000021
Pruning insignificant convolution kernels, where | · |) F Representing Frobenius norm, S, L, K and K' representing the number of test samples, the number of branches requiring deletion of convolution kernels simultaneously, the number of deleted convolution kernels and the number of remaining convolution kernels respectively, Y representing a characteristic diagram of the output of the current convolution kernel, X representing the characteristic diagram of the current convolution kernel, and k representing the channel corresponding to the kth input profile, W l,k The kth column representing the l convolution kernel, β is a k-dimensional vector, the value of each dimension represents the importance of a convolution kernel, λ 1 Is a penalty coefficient; first, W is fixed l,k Increase of λ 1 Calculating beta, deleting the minimum value in the beta vector and the convolution kernel corresponding to the minimum value, and fixing the beta after deleting the minimum elementOver-training update W l,k Circularly iterating until the number of the components in the beta is less than K';
step 1.2, unifying affine quantization, and performing affine quantization on the given CNN layer output compressed in the step 1.1 to 8-bit.
3. The method for inference based on joint compressed partitioning end edge cloud convolutional neural network as claimed in claim 1, wherein the specific operation of obtaining the model delay under all compressed partitioning schemes is: information collection is carried out on a given CNN model in a given end edge cloud system to obtain time delay of all compression division schemes, an N-layer CNN model is deployed in a 3-layer end edge cloud system, and division layers are set to be l (l) 1 ,l 2 ) Compression setting r ═ r (r) 1 ,r 2 ) Wherein 0 to l 1 Layer CNN model runs on the end equipment, l 1 Compression ratio of layer CNN model is r 1 ,l 1 +1 to l 2 Layer CNN model runs on edge equipment, l 2 Compression ratio of layer CNN model is r 2 ,l 2 The +1 to N layers of CNN models run on the cloud equipment, corresponding compression rates are achieved based on the compression models, and under the compression division scheme (l, r), end-to-end delay of CNN reasoning
Figure FDA0003673170220000022
Wherein l 0 ≡0,l 3 ≡N,T c Is the sum of the computation delays, T, on all devices of the edge cloud t Is the sum of the communication delays between the end edge clouds.
4. The inference method based on the edge cloud convolutional neural network of the joint compression partitioning as claimed in claim 1, wherein the specific operation of constructing the model accuracy upper bound estimation method at a given compression rate on a given CNN partitioning layer is:
model upper bound estimation at a given compressibility on a given CNN partition level, for a given partition level l g Compressibility r-precision A function (l) g R) has a monotonically concave nature, at a given division level and compression ratio (l) g ,r g ) Time of flightBased on two existing compression ratio-precision data points ((l) g ,r 1 ),A 1 ) And ((l) g ,r 2 ),A 2 ),r 1 ≤r 2 <r g Or r g <r 1 ≤r 2 Estimate scheme (l) g ,r g ) Upper bound of accuracy of
Figure FDA0003673170220000031
Selected existing data (l) from g ,r g ) The closer, the more accurate the estimate, so the distance (l) is chosen among the existing data g ,r g ) Two points with the smallest sum of distances.
5. The method for inference based on a joint compression partitioning end edge cloud convolutional neural network as claimed in claim 1, wherein a compression rate decision method at a given precision requirement and CNN partitioning level is constructed:
given CNN layer l with compression g Precision A-compressibility R function R (l) of post-CNN model g The single concave nature of A), quickly determining that the accuracy requirement A is met g When l is turned on g The highest compression ratio CRD (A) g |l g )=R * (l g ,A g );
The method comprises the following steps:
step 5.1, based on the existing sum (l) g ,A g ) Two data points with the smallest sum of distances ((l) g ,A 1 ),r 1 ) And ((l) g ,A 2 ),r 2 ) Based on an estimated value r' of the calculated compressibility;
step 5.2 data ((l) obtained by actual model compression g ,A′),r′);
Step 5.3, repeating steps 5.1 and 5.2 until R' no longer increases, the maximum compression ratio R * (l g ,A g ) If the estimated value of r 'is out of range in the loop iteration, a new r' is determined using dichotomy in the feasible value range.
6. A method according to claim 1 based on joint compressionA partitioned edge cloud convolutional neural network inference method, characterized in that at a given accuracy requirement A 0 Then, based on a model precision upper bound estimation method and a compression ratio decision method, searching a joint optimal model compression division scheme with optimal delay specifically comprises:
the method comprises the steps of compressing a partition scheme search algorithm through a joint optimal model, dynamically compressing a scheme search space, and determining that the given precision requirement A is met 0 Model optimization scheme for optimization of down-time * ,r * )。
7. The method for inference based on a joint compression partitioning end edge cloud convolutional neural network as claimed in claim 6, comprising the following steps:
step 6.1, setting local optimal delay T * =T max Let l 1 ←1;
Step 6.2, set l 2 ←l 1
Step 6.3, set scheme l as (l) 1 ,l 2 ) Based on local optimum delay T * Reducing a solution search space
Figure FDA0003673170220000041
Wherein
Figure FDA0003673170220000042
Is 1 1 、l 2 A set of layer selectable compression ratios;
step 6.4, based on R, let l 1 Candidate compression ratio
Figure FDA0003673170220000043
If the scheme is
Figure FDA0003673170220000044
Accuracy of model
Figure FDA0003673170220000045
Update l 1 Candidate compression ratio
Figure FDA0003673170220000046
Updating
Figure FDA0003673170220000047
Step 6.5, if
Figure FDA0003673170220000048
Let l 2 Candidate compression ratio
Figure FDA0003673170220000049
Updating
Figure FDA00036731702200000410
Based on step 4, if
Figure FDA00036731702200000411
And
Figure FDA00036731702200000412
are all greater than or equal to A 0 Wherein
Figure FDA00036731702200000413
Is a scheme
Figure FDA00036731702200000414
Estimating and updating the upper bound of model precision
Figure FDA00036731702200000415
Combined model compression division scheme with optimal time delay
Figure FDA00036731702200000416
Setting an optimal delay T * ←T(l * ,r * );
Step 6.6, update
Figure FDA00036731702200000417
Step 6.7, repeating the steps 6.5 and 6.6 until the step
Figure FDA00036731702200000418
Step 6.8, update l 2 ←l 2 +1;
Step 6.9, if l 2 N-1 or less, repeating the step 6.3 to 6.8;
step 6.10, update l 1 ←l 1 +1;
Step 6.11, if 1 N-1 or less, repeating the step 6.2 to 6.10;
step 6.12, output (l) * ,r * ) And T *
8. The method as claimed in claim 1, wherein T is a given delay requirement 0 Next, based on a model precision upper bound estimation method and a compression ratio decision method, searching for a joint optimal model compression partitioning scheme with optimal precision specifically includes:
searching the space by a dynamic compression scheme, determining the time when a given delay requirement T is met 0 Model optimization scheme with optimal lower precision (l) * ,r * )。
9. The method for inference based on a joint compression partitioning end edge cloud convolutional neural network as claimed in claim 8, comprising the following steps:
step 7.1, setting local optimal precision A * =A max Let l 1 ←1;
Step 7.2, set l 2 ←l 1
Step 7.3, put scheme l ═ l (l) 1 ,l 2 ) Based on local optimum accuracy A * Reducing a solution search space
Figure FDA0003673170220000051
Wherein
Figure FDA0003673170220000052
Is 1 1 、l 2 A set of layer selectable compression ratios;
step 7.4, based on R, let l 1 Candidate compression ratio
Figure FDA0003673170220000053
Updating
Figure FDA0003673170220000054
Step 7.5, if
Figure FDA0003673170220000055
Let l 2 Candidate compression ratio
Figure FDA0003673170220000056
Updating
Figure FDA0003673170220000057
If it is
Figure FDA0003673170220000058
And
Figure FDA0003673170220000059
are all greater than A * Setting the optimal compression division scheme of the combined model
Figure FDA00036731702200000510
Setting the optimum precision A * ←A(l * ,r * );
Step 7.6, update based on step 4
Figure FDA00036731702200000511
Step 7.7, repeating steps 7.5 and 7.6 until
Figure FDA00036731702200000512
Step 7.8, update l 2 ←l 2 +1;
Step 7.9, if 2 N-1 or less, repeating the step 7.3 to 7.8;
step 7.10, update l 1 ←l 1 +1;
Step 7.11, if 1 N-1 or less, repeating the step 7.2 to 7.10;
step 7.12, output (l) * ,r * ) And A *
10. An edge cloud convolution neural network inference system based on joint compression partitioning is characterized by comprising:
the model compression method construction module is used for constructing a communication optimal model compression method and compressing the communication traffic of the CNN model on any given layer through congruent channel pruning and uniform affine quantization;
the model delay obtaining module is used for carrying out information collection on a given CNN model in a given end edge cloud system based on a constructed model compression method to obtain the delay of the model under all compression division schemes;
a performance upper and lower bound determining module for determining the performance upper bound (T) of the combined compression division scheme based on the obtained time delay of all compression division schemes max ,A max ) And lower bound (T) min ,A min ) Wherein, T max 、T min To infer the upper and lower bounds of the delay, A max 、A min To infer the upper and lower bounds of precision, (T) max ,A max ) Determined by the scheme with minimal delay when no compression is present, (T) min ,A min ) Determined by the scheme with the minimum compression time delay;
the estimation method construction module is used for constructing a model precision upper bound estimation method under a given compression ratio on a given CNN division layer;
the decision method construction module is used for constructing a compression rate decision method when the precision requirement and CNN are given to divide layers;
optimal model compressionA partitioning scheme obtaining module for obtaining a partitioning scheme at a given precision requirement A 0 And searching a joint optimal model compression division scheme with optimal time delay based on a model precision upper bound estimation method and a compression rate decision method, wherein if the given precision is greater than an upper bound A max Directly providing an upper bound scheme; if the given precision is less than the lower bound A min Directly providing a lower bound scheme; the rest of the cases, based on a given accuracy requirement A 0 Search time delay optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal end-to-end reasoning delay T based on the model optimized by the scheme * (ii) a At a given delay requirement T 0 And searching a combined optimal model compression division scheme with optimal precision based on a model precision upper bound estimation method and a compression rate decision method, and if the given delay is greater than an upper bound T max Directly providing an upper bound scheme; if the given delay is less than the lower bound T min Directly providing a lower bound scheme; otherwise, based on a given delay requirement T 0 Search precision optimized joint model compression partitioning scheme (l) * ,r * ) And outputting the optimal inference precision A of the model based on the scheme optimization *
An output module for an output-based joint optimal model compression partitioning scheme (l) * ,r * ) And optimizing the model, deploying the model in the end edge cloud system, operating the system and performing model reasoning.
CN202210611122.9A 2022-05-31 Terminal edge cloud collaborative convolutional neural network reasoning method and system Active CN114970824B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210611122.9A CN114970824B (en) 2022-05-31 Terminal edge cloud collaborative convolutional neural network reasoning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210611122.9A CN114970824B (en) 2022-05-31 Terminal edge cloud collaborative convolutional neural network reasoning method and system

Publications (2)

Publication Number Publication Date
CN114970824A true CN114970824A (en) 2022-08-30
CN114970824B CN114970824B (en) 2024-05-10

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113067873A (en) * 2021-03-19 2021-07-02 北京邮电大学 Edge cloud collaborative optimization method based on deep reinforcement learning
US20210295165A1 (en) * 2020-03-18 2021-09-23 Donghua University Method for constructing efficient product surface defect detection model based on network collaborative pruning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210295165A1 (en) * 2020-03-18 2021-09-23 Donghua University Method for constructing efficient product surface defect detection model based on network collaborative pruning
CN113067873A (en) * 2021-03-19 2021-07-02 北京邮电大学 Edge cloud collaborative optimization method based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
薛峰;方维维;: "EdgeMI:资源受限条件下深度学习多设备协同推理", 现代计算机, no. 20, 15 July 2020 (2020-07-15) *

Similar Documents

Publication Publication Date Title
Li et al. Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution
WO2022063247A1 (en) Neural architecture search method and apparatus
CN112149797A (en) Neural network structure optimization method and device and electronic equipment
CN114912705A (en) Optimization method for heterogeneous model fusion in federated learning
CN112949840A (en) Channel attention guided convolutional neural network dynamic channel pruning method and device
CN110309904B (en) Neural network compression method
CN111176853A (en) Data quantization method and device, computer equipment and storage medium
CN113595993B (en) Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
JP2021108039A (en) Model compression device and program
CN112835715A (en) Method and device for determining task unloading strategy of unmanned aerial vehicle based on reinforcement learning
CN114528987A (en) Neural network edge-cloud collaborative computing segmentation deployment method
US20220198271A1 (en) Method for building a resource-frugal neural network
CN114757347A (en) Method and system for realizing low bit quantization neural network accelerator
CN117032938B (en) Operator parallel scheduling method and device, electronic equipment and storage medium
CN114239799A (en) Efficient target detection method, device, medium and system
CN117521752A (en) Neural network acceleration method and system based on FPGA
CN114970824B (en) Terminal edge cloud collaborative convolutional neural network reasoning method and system
CN114970824A (en) Edge cloud collaborative convolution neural network reasoning method and system
CN113504949A (en) Task unloading and parameter optimization method and system for MAR client in edge computing
CN116663644A (en) Multi-compression version Yun Bianduan DNN collaborative reasoning acceleration method
CN115759209B (en) Quantification method and device of neural network model, electronic equipment and medium
CN112446461A (en) Neural network model training method and device
CN115913995A (en) Cloud service dynamic QoS prediction method based on Kalman filtering correction
CN114118358A (en) Image processing method, image processing apparatus, electronic device, medium, and program product
CN114785692A (en) Virtual power plant aggregation regulation and control communication network flow balancing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant