CN114723058A - Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis - Google Patents

Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis Download PDF

Info

Publication number
CN114723058A
CN114723058A CN202210369401.9A CN202210369401A CN114723058A CN 114723058 A CN114723058 A CN 114723058A CN 202210369401 A CN202210369401 A CN 202210369401A CN 114723058 A CN114723058 A CN 114723058A
Authority
CN
China
Prior art keywords
strategy
division
sigma
division strategy
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210369401.9A
Other languages
Chinese (zh)
Inventor
姬晨晨
于佳耕
侯朋朋
邰阳
苗玉霞
佟晓宇
张丽敏
全雨
武延军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202210369401.9A priority Critical patent/CN114723058A/en
Publication of CN114723058A publication Critical patent/CN114723058A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/045Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • G06N3/105Shells for specifying net layout
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a neural network end cloud collaborative reasoning method and device for high sampling rate video stream analysis. Under the actual high-sampling-rate video stream oriented scene, through modeling analysis on the deep neural network, while the inference task is divided into different stages of an edge end/cloud end, a periodic cycle division strategy of the deep neural network model is completed, and the bottleneck of the high-sampling-rate video stream oriented at present is broken through. By the aid of the periodic cycle division strategy, inference waiting time of the edge end/cloud end in the system is reduced under a specified depth neural network model, the ultimate throughput of the system is achieved, and computing capabilities of the edge end and the cloud end are fully utilized.

Description

Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis
Technical Field
The invention belongs to the technical field of deep neural network end cloud collaborative reasoning acceleration optimization, and particularly relates to a neural network end cloud collaborative reasoning method for high sampling rate video stream analysis.
Background
Recent advances in Deep Neural Networks (DNNs) have greatly increased the accuracy and speed of computer vision and video analysis, creating a new avenue for a new generation of intelligent applications. The maturity of cloud computing, equipped with powerful hardware such as TPU, GPU, etc., has become a typical choice for such compute-intensive DNN tasks. For example, in an automotive application, a camera constantly monitors the surrounding scene and transmits it to a server, which then performs video analysis and feeds back control signals to pedals and the steering wheel. In one augmented reality application, the smart glass keeps track of its current view and streams the information to the cloud server, which performs object recognition and sends back contextual augmented tags for seamless display on the actual scene.
The DNN network consists of multiple network layers, and the video is inferred by DNN operating on each frame separately using a feed forward algorithm. The algorithm starts at the input layer and progresses from layer to layer. Each layer receives as input the output of a previous layer, performs a series of calculations on the input data to obtain an output, and provides its output to a subsequent layer. This process terminates once the computation of the output layer is complete. However, the inference process of DNN is that data is generated at the edges and data frames enter the DNN as raw inputs. The computation of each layer in the DNN may be performed at the edge or in the cloud. The computing layer of the edge device does not need to transmit data to the cloud, but the amount of computation and the computation time increase due to limited device resources. The cloud computing layer can reduce the amount of computation, but transmission delay is generated in the process of transmitting data from the edge device to the cloud, and how to deduce DNN at the edge end and the cloud end to achieve the shortest time delay or the maximum throughput is a main problem at present.
Because the data size of some intermediate DNN layers is obviously smaller than the original input data, most of the existing neural network end cloud collaborative reasoning methods divide a deep neural network model, establish a mathematical model by the sum of divided total time, and solve the optimal value of the mathematical model, wherein the optimal solutions are established under the condition of low system load, namely the input rate of a data stream meets certain system requirements, and if the input rate is too large, the input rate is required to be reduced. The limitation of the (sampling rate) condition obviously does not conform to practical application scenes, such as whether vehicles are legal or not and the time of entering and leaving needs to be recorded at an expressway entrance and an expressway toll station, and the license plate recognition system greatly improves the toll collection efficiency and communication cost. When the traffic flow is large, if the system is not adjusted, the department license plate can pass the station without being identified, so that the system is disordered; if the rate of entrance and exit is limited, vehicles at the entrance and exit are accumulated, and traffic accidents and blockages are likely to occur. The neural network end cloud collaborative reasoning method oriented to high sampling rate video stream analysis can improve the throughput of the system to the maximum extent under the condition of not reducing the sampling rate, and meets the requirements on reasoning instantaneity and the application of a real scene.
Disclosure of Invention
The invention aims to provide a neural network end cloud collaborative reasoning method and device for high sampling rate video stream analysis aiming at the defects of the prior art.
The invention relates to a neural network end cloud collaborative reasoning method oriented to high sampling rate video stream analysis, which comprises the following steps:
step one, under a neural network architecture scene facing high sampling rate video stream analysis, a deep learning model is modeled into a chain graph (line), wherein the top point of the chain graph represents a model layer of the neural network, and an arrow of the chain graph represents data transmission between the model layers of the neural network.
And secondly, dividing any two adjacent nodes in the chain graph, wherein the dividing edge is marked as t, the left end of the node at t shows that the node performs inference at the edge end, and the right end of the node at t shows that the node performs inference at the cloud end. The split edge t represents the time required for transmitting the left end point data to the right end point data of the split edge, and is also the time required for transmitting the data from the edge end to the cloud end.
And step three, evaluating the hierarchical time delay and the energy consumption of the neural network according to the hardware environment and the data set of the edge end and the cloud end for different partition edges t, recording the time delay and the data volume of processing tasks of the edge end and the cloud end according to the partition of the corresponding neural network model (namely the partition in the step two), and recording the time delay required by data transmission between the edge end and the cloud end according to the bandwidth of the network and the size of the transmission data.
And step four, the edge end and the cloud end give an optimal periodic cycle division strategy according to the current sampling rate Q, the bandwidth B and the deep learning network model and the model division algorithm, namely the edge end and the cloud end switch to the corresponding sequential division edges when each frame of video stream arrives according to the periodic cycle division strategy, and the system reaches the ultimate throughput of the neural network model under the condition of high sampling rate.
And step five, the edge end receives the first frame of video frame, deduces tasks before edge division for the input first frame of video frame according to a first division strategy in the periodic cycle division strategy to obtain a first reasoning result, sends the first reasoning result to the cloud end, switches the edge end into a second division strategy after finishing, deduces the input second frame of video, and the like.
Step six, the cloud end takes a first reasoning result sent by the edge end as input, completes a subsequent reasoning process according to a first division strategy, and returns a reasoning result of a first frame; and after the second partition strategy is completed, the cloud end is switched to the second partition strategy, a second inference result of the edge end is waited, and the like.
And step seven, finishing inputting the k frames of video streams, outputting all inference results by the system, and finishing the inference.
Further, in the step two, the dividing edge t traverses edges among all the model layers, the deep learning model layers have m layers, the dividing edge t has m-1, and t belongs to { t ∈ [ t ]1,…,tm-1T is more than or equal to 1, n layers of model layers needing inference at the edge end are arranged under each division edge T, the size of data output by the edge end is s, the data are transmitted to the cloud end as transmission data under a certain network bandwidth B, and the transmission time T ists/B. The size of input data received by the cloud is s, and the model layers needing reasoning of the cloud are m-n layers in total.
Further, in the third step, the processing time T of the deep learning model layer at the edge end and the cloud endeAnd TcThe reality is obtained by actual measurement or simulation model prediction; time T of data between deep learning model layers transmitted between edge end and cloud endtThe method is obtained through actual measurement, or through detecting the network bandwidth between the edge end and the cloud end and calculating the ratio of the size of data output by the edge end to the bandwidth.
Further, the total time of the subtasks of the edge-side inference is Te
Figure BDA0003587421420000031
The total time of the subtasks of the cloud reasoning is
Figure BDA0003587421420000032
Wherein T isiThe inference time of the ith layer of the deep neural network and the time of data transmission between the edge end and the cloud end are Tt=s/B。
Further, when the sampling rate satisfies 1/Q>minmax{Te,Tt,TcWhen it is determined, the shortest total time T of a single frame is determinedPartitioning strategy, i.e. according to min (T)e+Tt+Tc) Determining a partitioning strategy to maximize system throughput; when in high load mode, i.e. 1/Q<minmax{Te,Tt,TcAnd at this time, the optimal periodic cycle division strategy is adopted to divide the deep learning model.
Further, in the fourth step, the input of the model partitioning algorithm is (m-1) x 3-dimensional vector, sampling rate Q and bandwidth B, and the output is the partitioning strategy of the neural network model. In the input (m-1) x 3-dimensional vector, each row vector represents { T under a division edge Te,Tt,TcFor a deep learning network model with m layers, there are m-1 partition edges. Input ═ Te1,Tr1,Tc1;Te2,Tt2,Tc2;…;Te(m-1),Tt(m-1),Tc(m-1)}. The sampling rate is Q, which may represent the maximum value of the theoretical throughput, i.e. the number of pictures that are available for transmission in 1 second. For the case of n-frame video stream, 1/Q is adopted in parallel mode when the system is in low load mode>minmax{Te,Tt,TcWhere minmax { T }e,Tt,TcDenotes three times T under each partition strategye,Tt,TcThe time occupying the maximum is minimized, in which case parallel operation in the system does not result in latency, which is based on min (T)e+Tt+Tc) Determining a partitioning strategy sigma, i.e. a low load, based on the shortest total time T of a single frame, where T is Te+Tt+Tc. At this time, the maximum limit throughput that the system can reach is greater than the maximum value of the theoretical throughput of the sampling rate Q, and at this time, the maximum system throughput can be satisfied by adopting the partitioning strategy σ. When the system is in high load mode, 1/Q<minmax{Te,Tt,TcAt this time, the maximum limit throughput of the whole system adopting the single-frame division strategy sigma is approximately equal to n/(n × minmax { T }e,Tt,Tc}+T1+T2) Wherein T is1、T2Is max { Te,Tt,TcTwo more of them. E.g., max { T }e,Tt,TcIs TeThen T is1、T2Is Tt,Tc. When T is too large or n is large, the above equation is equal to about 1/(min max { T }e,Tt,TcAnd) }), it can be seen that the maximum limit throughput according to the single-frame partitioning strategy is less than the theoretical maximum throughput reached by the sampling rate Q, and the theoretical maximum throughput of the sampling rate Q is not exceeded anyway according to the single-frame partitioning strategy.
Further, the division policy that the total time T of the single frame is shortest is not necessarily the optimal division policy in the case of multiple frames. In the prior art, all operations are performed on the same partition point, the optimal pipeline execution partition node is found, and the waiting time still exists, so that the optimizable space still exists. The model division algorithm of the invention works out a periodic cyclic model division strategy, and different model division strategies sigma are adopted for each frameiThe method is obtained by the periodic cycle division strategy algorithm.
Further, the optimal cyclic partitioning strategy σ ═ σ { (σ ═ σ)12,…,σqIn which σ is12,…,σqRespectively representing the 1 st division strategy, the 1 st division strategy and the … … q division strategy. The edge end and the cloud end carry out reasoning according to the cyclic division strategy, and for the input of k frame video stream, the edge end adopts the 1 st division strategy sigma1After reasoning the 1 st frame video stream, the 1 st reasoning result s is obtained1Transmitting the data to the cloud end, and switching the partitioning strategy of the data to the 2 nd partitioning strategy sigma2(ii) a Cloud adopts 1 st partition strategy sigma1By s1As input, after reasoning the rest tasks, returning the reasoning result of the 1 st frame video stream, and switching the self partitioning strategy into the 2 nd partitioning strategy sigma2And repeating the above process; the edge terminal adopts the q-th division strategy sigmaqAfter the q frame video stream is reasoned, the q frame inference result s is obtainedqTransmitting the data to the cloud, and switching the own partitioning strategy into the 1 st partitioning strategy sigma1(ii) a CloudThe terminal adopts the q-th partition strategy sigmaqWith the qth inference result sqAs input, after reasoning the rest tasks, returning the reasoning result of the q frame video stream, and switching the self partitioning strategy into the 1 st partitioning strategy sigma1And a cyclic process is achieved. Therefore, the input of the k frames of video streams from the edge and the cloud end is totally switched by the dividing strategy, wherein the number of times is N ═ k% q-1, and% represents the remainder operation.
Further, for DNN, the number of some intermediate results (output of intermediate layers) is significantly smaller than the number of original input data. For example, the input data size of the mini YOLOv2 is 0.95MB, while the output data size of the middle tier max5 is 0.08MB, a 93% reduction. This provides us with the opportunity to take advantage of the powerful computing power of cloud computing and the proximity of edge computing. Specifically, a portion of the DNN may be computed at the edge side, a small number of intermediate results transmitted into the cloud, and then the left portion computed at the cloud side. The division of DNN constitutes a trade-off between computation and transmission. Dividing on different layers results in different computation times and transmission times. Therefore, an optimal partitioning is desirable.
Based on the same inventive concept, the present invention further provides a neural network-side cloud collaborative inference apparatus oriented to high-sampling-rate video stream analysis, which is an electronic apparatus including a memory and a processor, wherein the memory stores a computer program configured to be executed by the processor, and the computer program includes instructions for executing the above-described method of the present invention.
Compared with the prior art, the invention has the following positive effects:
(1) the invention relates to a neural network end cloud collaborative reasoning method facing high sampling rate video stream analysis, which is characterized in that under the actual high sampling rate video stream scene, through the modeling analysis of a deep neural network, a reasoning task is divided into different stages of an edge end/a cloud end, meanwhile, a periodic cycle division strategy of a deep neural network model is completed, and the bottleneck of the high sampling rate video stream facing at present is broken through;
(2) by the aid of the periodic cycle division strategy, inference waiting time of the edge end/cloud end in the system is reduced under a specified depth neural network model, the ultimate throughput of the system is achieved, and computing capabilities of the edge end and the cloud end are fully utilized.
Drawings
FIG. 1 is a diagram illustrating the total transmission time of AlexNet with different edge segments;
FIG. 2 is a schematic diagram of a high sample rate case;
FIG. 3 is a schematic diagram of a loop partitioning strategy in an embodiment;
FIG. 4 is a diagram illustrating a single partitioning policy in an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in more detail with reference to the accompanying drawings and corresponding embodiments.
As shown in FIG. 1, taking AlexNet network as an example, the segmentation edge t traverses edges between all model layers, the deep learning model layers have 24 layers, the segmentation edge t has 23, t is equal to {1,2,3, …,23}, and t is equal to or greater than 1. Under each division condition, the model layer needing reasoning at the edge end has n layers, and the size of data output by the edge end is snUnder a certain network bandwidth B, snAs transmission data, transmitting to cloud end for transmission time Tt=snand/B. The size of the input data received by the cloud end is snAnd the model layers needing reasoning at the cloud end have m-n layers. Processing time T of deep learning model layer at edge end and cloud ende、TcActually obtained through actual measurement or simulation model prediction; time T of data transmission between deep learning model layers between edge end and cloud endtThe ratio of the size of data output by the edge end to the bandwidth is calculated by actually measuring or detecting the network bandwidth between the edge end and the cloud end.
Through system actual measurement, the total time of subtasks inferred at the edge end is Te
Figure BDA0003587421420000051
The total time of the subtasks of the cloud reasoning is Tc
Figure BDA0003587421420000052
The total time of data transmission of the edge end and the cloud end is Tt=stand/B. For the division strategy with the shortest total time T of a single frame, the total time of deep neural network inference is Tone partition=Te+Tt+Tc
Experiments were performed in the system built, where the edge server processor model was: intel (R) core (TM) i7-10510U CPU @1.80GHz 2.30GHz, and the model of the cloud server processor is as follows: CPU Intel (R) Xeon (R) Silver 4116CPU @2.10GHz with operating system Ubuntu 18.04. Calculating the total time of deep neural network inference under all the division edges T under the condition of a single frame, and representing the total time in the mode of figure 1, wherein the abscissa represents the division edge T in a certain network layer, the ordinate represents the division edge T under the condition of the division edge T, and the ordinate represents Tone partition=Te+Tt+TcThe calculated total time, at this point, gives m-1 ═ 23 partitioning strategies as part of the input to the model partitioning algorithm.
First, given the definition of a high sampling rate, as shown in fig. 2, where frame a, frame B, and frame C represent three video frames in a video stream, when referring to the high sampling rate, one faces necessarily a continuous video stream, t in fig. 20、t1And t2The arrival of a video frame is represented, namely at a sampling rate Q, a frame of picture is taken at the moment, and the frame of picture is input into a deep neural network for reasoning. The representation of fig. 2 is representative (assuming that the inference time of the edge is greater than the transmission time and the cloud inference time), assuming that the initialized partition strategy is the shortest total time consumption of a single frame picture, and the edge and the cloud always adopt the partition strategy, when a next frame arrives at the edge, if the edge has already inferred the subtask of the previous frame at this time, an upcoming new frame can be immediately inferred, and the sampling rate at this time is referred to as a low sampling rate in the present invention, as shown in the lower part of fig. 2. When the next frame comes, if the edge end does not have the subtask of the previous frame at the moment, the edge end cannot reason the coming new frame immediately, and the new frame at the momentThe inference of a frame necessarily has a waiting time, and the edge end is waited to finish the inference task of the previous frame and then starts the inference, and the sampling rate at this time is called as high sampling rate in the invention, as shown in the upper part of fig. 2. In the high sampling rate state, the waiting time of the edge terminal in the n frames of video streams is T in totalwait=n*(T e1/Q), when the video stream is longer, the waiting time is longer, and the transmission process and the cloud reasoning process are in an idle state in the waiting time, so that partial resources are wasted.
The periodic cycle division algorithm for the high sampling rate video stream provided by the invention has the following steps:
1) calculating a total partition strategy sigma of m-1 partition edges to be { sigma12,…,σm-1}. Each division strategy sigma corresponds to three times which are respectively edge end processing time Te(i.e., end-side processing time in fig. 2), transmission time T of data transmitted from edge to cloudt(i.e., data transmission time in FIG. 2), and time T of cloud processingc(i.e., cloud processing time in fig. 2), the inference total time is: t ═ Te+Tt+TcThe optimization strategy of a single frame in the partitioning strategy is recorded as
Figure BDA0003587421420000061
Division strategy sigma for expressing shortest total time T of single-frame inference1This division strategy is set as the 1 st division strategy of the periodic cycle division strategy.
2) Classifying all the division strategies to separate three categories, namely three sets, Te<TtAnd Te<Tc,②Tt<TeAnd Tt<Tc,③Tc<TeAnd Tc<Tt. According to σ1To which set the algorithm is to be inferred, if the set has only sigma1One element, the result of the periodic cycle division strategy is sigma' ═ { sigma1And the step of obtaining the periodic cycle division strategy is that the periodic cycle division strategy is constant, and sigma' represents the finally obtained periodic cycle division strategy. Sigma1The problems belonging to the three cases can be mutually inverted,the following step 3) illustrates the subsequent algorithm by taking the third example.
3) For TcThe smallest set, i.e., (c), the partitioning strategy is denoted as σ={σ12,…,σrAt this time, the sampling rate 1/Q<minmax{Te,Tt,Tt},Te,Tt,TcE sigma. Wherein, for the convenience of the formula, Te>TtI.e. 1/Q<TeThe next frame k +1 arrives when the current frame k is not inferred, k starts from 1, TeAnd TtThe difference of (A) is the waiting time Tp,Tp=|Te-TtThe latency of the first frame is zero, i.e. no latency is needed. When k is larger than 1, due to the pipeline parallel processing, after the edge end processing finishes the first stage of the current frame k, the first stage of the next frame k +1 can be processed; after the current frame k finishes the first stage, entering a data transmission stage, and after the next frame k +1 finishes the first stage, entering the data transmission stage; however, when the next frame k +1 enters the data transmission stage, it is required to satisfy that the data transmission stage of the current frame k has been processed, that is, the network is in an available state, if the data of the current frame k has not been transmitted, the data transmission of the next frame k +1 needs to wait, and the waiting time T between the next frame and the current framep(k+1)=|Te(k+1)-Tt(k)|。
4) In order to obtain the optimal periodic cycle division strategy under the current condition, searching is carried out according to the following steps:
i. the set of known partitioning strategies is: sigma={σ i1.. r }, the set of the optimal periodic cycle division strategy is as follows: { sigma. }jQ, where q is ≦ r. The division strategy is sigma obtained according to the shortest total inference time of the single frameone partitionThis partitioning policy is in the total set of partitioning policies. Will sigmaone partitionIs used as an initial partitioning strategy and is marked as sigmatmpAdding the obtained data into an optimal cycle division strategy;
and ii, reasoning the picture coming from the first frame according to the dividing strategy in the step i, and determining the dividing strategy at the momentEdge inference time slightly below:
Figure BDA0003587421420000071
data transmission time:
Figure BDA0003587421420000072
when the next frame of picture comes, searching a division strategy set sigmaIn
Figure BDA0003587421420000073
Closest approach to
Figure BDA0003587421420000074
As a partitioning policy of the current frame,
Figure BDA0003587421420000075
if it is not
Figure BDA0003587421420000076
Exiting the cycle to obtain a final cycle division strategy; otherwise, adding the partitioning strategy into the optimal cycle partitioning strategy and executing the step iv;
update
Figure BDA0003587421420000077
And repeating the step iii for the next frame of picture.
5) Obtaining an optimal period cycle division strategy sigma-sigma by the cycle result12,…,σqThe total time of the video stream from the 1 st to the k-th frame is:
Figure BDA0003587421420000078
for the shortest partition case of a single frame, the total time of inference of k frames is:
Tone partition=k*Te+Tt+Tc
the final partitioning strategy is:
Figure BDA0003587421420000079
taking a specific example for illustration, the sampling interval in this example is: 1/Q is 10 ms. The partitioning results run time for this neural network is shown in table 1:
table 1 shows the time of three phases under 4 division strategies
Partitioning strategy Te(ms) Tt(ms) Tc(ms)
σ1 30 20 10
σ 2 20 30 15
σ3 40 25 8
σ 4 10 15 40
The following describes the algorithm in detail according to the contents of table 1:
1) calculating the edge end processing time T of 4 division strategies according to the algorithm step 1eAnd the transmission time T of data from the edge end to the cloud endtAnd time T of cloud processingc
2) Partition strategy sigma1σ2σ3Satisfy Tc<TeAnd Tc<TtIs TcSearching an algorithm in the minimum set to find an optimal periodic cycle division strategy;
3) the sampling interval meets the condition that 1/Q is 10ms <20ms <30ms, namely, the next frame arrives when the current frame is processed at any moment;
4) go through to find the minimum value of T
Figure BDA0003587421420000081
At this time, the corresponding strategy is that the first frame division strategy is sigma1
Tp(2)=minTp(j)=min{|Te(j)-Tt(1)|}=|Te(j)-Tt(1)|j=20ms, the second frame partition strategy is σ2
Tp(3)=minTp(j)=min{|Te(j)-Tt(2)|}=|Te(j)-Tt(2)|j=10ms, the third frame division strategy is σ3
In this case, T is set to 1 or 3e(a)==Te(b) And the cycle is skipped.
5)σ={σ12And the optimal periodic cycle partitioning strategy comprises q partitioning methods, namely q is 2. And finally, the division strategy frequency of system switching in the calculation process is N-k% q-1.
The schematic diagram of the total time of the cyclic partition strategy is shown in fig. 3, and the total time is expressed as follows:
Figure BDA0003587421420000082
the diagram of the total time of the shortest division of the single division point is shown in fig. 4, and the total time is expressed as follows:
Tone partition=k*Te+Tt+Tc=30k+30
when k is 99, the total time inferred according to the periodic cycle division strategy is 2510ms, and the total time inferred according to the division strategy with the shortest time of a single frame is 3000 ms; when k is 100, the total time inferred according to the periodic cycle division strategy is 2545ms, and the total time inferred according to the single-frame shortest time division strategy is 3030 ms. The periodic cycle division algorithm of the invention is superior to the division algorithm with the shortest time of a single frame.
To obtain this, if k is odd, T ≦ Tone partitionThe value range of k is obtained, wherein k is more than or equal to 1; when k is even number, T is less than or equal to Tone partitionAnd the value range of k is obtained, wherein k is more than or equal to 3.
Under the scene of high-sampling-rate video stream analysis, the periodic cycle division strategy neural network segment cloud collaborative reasoning method provided by the invention improves the reasoning speed.
Based on the same inventive concept, another embodiment of the present invention provides a neural network-side cloud collaborative inference apparatus oriented to high-sampling-rate video stream analysis, which is an electronic apparatus (computer, server, smartphone, etc.) including a memory and a processor, wherein the memory stores a computer program configured to be executed by the processor, and the computer program includes instructions for executing steps in the method of the present invention.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
It should be noted that the above-described embodiments are only a part of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without any creative effort, shall fall within the protection scope of the present invention.

Claims (8)

1. A neural network end cloud collaborative reasoning method oriented to high sampling rate video stream analysis is characterized by comprising the following steps:
under the neural network architecture scene facing high sampling rate video stream analysis, a deep learning model is modeled into a chain graph, the top point of the chain graph represents the model layer of the neural network, and the arrow of the chain graph represents data transmission between the model layers of the neural network;
dividing any two adjacent nodes in the chain graph, wherein the dividing edge is marked as t, the left end of the node at t represents that the node carries out reasoning at the edge end, and the right end of the node at t represents that the node carries out reasoning at the cloud end;
for different partition edges t, recording time delay and data volume of processing tasks of the edge end and the cloud end, and recording time delay required by data transmission between the edge end and the cloud end according to the bandwidth of a network and the size of transmission data;
the edge end and the cloud end give an optimal periodic cycle division strategy through a model division algorithm according to the current sampling rate, the bandwidth and the deep learning model;
the method comprises the steps that a first frame of video frames are received by an edge end, tasks before edge division are inferred for the input first frame of video frames according to a first division strategy in a periodic cycle division strategy to obtain a first inference result, the first inference result is sent to a cloud end, then the edge end is switched to a second division strategy, inference is conducted on the input second frame of video frames, and the like;
the cloud end takes a first reasoning result sent by the edge end as input, completes a subsequent reasoning process according to a first division strategy and returns a reasoning result of a first frame;
and the cloud end is switched into a second division strategy, a second inference result of the edge end is waited, and the like until the input video stream is ended, and all inference results are output.
2. The method according to claim 1, wherein the deep learning model layer has m layers, the number of the division edges t is m-1, the input of the model division algorithm is (m-1) x 3-dimensional vector, the sampling rate Q and the bandwidth B, and the output is the division strategy of the neural network model; in the input (m-1) x 3-dimensional vector, each row vector represents { T under a division edge Te,Tt,TcIn which T iseIs the processing time, T, of the model layer of deep learning at the edge endcIs the processing time, T, of the model layer of deep learning at the cloudtThe time of data transmission between the deeply learned model layers at the edge end and the cloud end is used.
3. The method of claim 2, wherein the sampling rate satisfies 1/Q>minmax{Te,Tt,TcWhen it is, the division strategy is determined according to the shortest total time T of a single frame, namely according to min (T)e+Tt+Tc) Determining a partitioning strategy to maximize system throughput; when in high load mode, i.e. 1/Q<minmax{Te,Tt,TcAnd at this time, the optimal periodic cycle division strategy is adopted to divide the deep learning model.
4. The method of claim 3, wherein the model partitioning algorithm comprises the steps of:
calculating a total partition strategy sigma of m-1 partition edges to be { sigma12,…,σm-1And each division strategy corresponds to three times, namely Te,Tt,TcThe total reasoning time is as follows: t ═ Te+Tt+TcThe optimization strategy of a single frame in all partitioning strategies is recorded as
Figure FDA0003587421410000011
Division strategy sigma for expressing shortest total time T of single-frame inference1Setting the division strategy as a 1 st division strategy of a periodic cycle division strategy;
classifying all the division strategies to separate three categories, namely three sets, Te<TtAnd Te<Tc,②Tt<TeAnd Tt<Tc,③Tc<TeAnd Tc<Tt(ii) a According to σ1To which set the algorithm is to be inferred, if the set has only sigma1One element, the result of the periodic cycle division strategy is sigma' ═ { sigma1And the step of obtaining the periodic cycle division strategy is that the periodic cycle division strategy is constant, and sigma' represents the finally obtained periodic cycle division strategy.
5. The method according to claim 4, characterized in that for set (c), the following steps are used to obtain an optimal periodic cycle partitioning strategy:
the set of known partitioning strategies is: sigma={σi1.. r }, and the optimal set of the periodic cycle division strategy is as follows: { sigma. }jQ, wherein q is less than or equal to r; the division strategy is sigma obtained according to the shortest total inference time of the single frameone partitionWill σone partitionIs used as an initial partitioning strategy and is marked as sigmatmpAdding the obtained data into an optimal periodic cycle division strategy;
dividing a picture coming from a first frame according to a step division strategy sigmatmpReasoning is carried out, and edge reasoning time under the partitioning strategy is determined
Figure FDA0003587421410000021
And data transmission time
Figure FDA0003587421410000022
When the next frame of picture comes, a division strategy set sigma is foundIn
Figure FDA0003587421410000023
Closest approach to
Figure FDA0003587421410000024
As a partitioning policy of the current frame,
Figure FDA0003587421410000025
if it is not
Figure FDA0003587421410000026
Exiting the cycle to obtain a final cycle division strategy; otherwise, adding the partitioning strategy into the optimal cycle partitioning strategy and executing the step iv;
updating
Figure FDA0003587421410000027
Repeating the steps for the next frame of picture to finally obtain the optimal periodic cycle division strategy sigma-sigma { sigma }12,…,σq}。
6. The method according to claim 1, characterized in that for the high sampling rate video stream, the first division strategy of the optimal periodic cycle division strategy is inferred for the picture coming from the first frame, the second division strategy of the optimal periodic cycle division strategy is inferred for the picture coming from the second frame, the frame stream is processed according to the rule, when the last division strategy of the optimal periodic cycle division strategy is traversed, the cycle is started by returning to the first division strategy, and so on.
7. A neural network-side cloud collaborative reasoning apparatus for high-sampling-rate video stream analysis, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any one of claims 1 to 6.
8. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 6.
CN202210369401.9A 2022-04-08 2022-04-08 Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis Pending CN114723058A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210369401.9A CN114723058A (en) 2022-04-08 2022-04-08 Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210369401.9A CN114723058A (en) 2022-04-08 2022-04-08 Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis

Publications (1)

Publication Number Publication Date
CN114723058A true CN114723058A (en) 2022-07-08

Family

ID=82241818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210369401.9A Pending CN114723058A (en) 2022-04-08 2022-04-08 Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis

Country Status (1)

Country Link
CN (1) CN114723058A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894469A (en) * 2023-09-11 2023-10-17 西南林业大学 DNN collaborative reasoning acceleration method, device and medium in end-edge cloud computing environment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894469A (en) * 2023-09-11 2023-10-17 西南林业大学 DNN collaborative reasoning acceleration method, device and medium in end-edge cloud computing environment
CN116894469B (en) * 2023-09-11 2023-12-15 西南林业大学 DNN collaborative reasoning acceleration method, device and medium in end-edge cloud computing environment

Similar Documents

Publication Publication Date Title
CN110175671B (en) Neural network construction method, image processing method and device
US11747155B2 (en) Global path planning method and device for an unmanned vehicle
CN110223517B (en) Short-term traffic flow prediction method based on space-time correlation
CN106447034B (en) A kind of neural network processor based on data compression, design method, chip
CN113067873A (en) Edge cloud collaborative optimization method based on deep reinforcement learning
CN111613072A (en) Intelligent signal lamp timing optimization method, device, equipment, system and medium
CN112561027A (en) Neural network architecture searching method, image processing method, device and storage medium
CN113362491B (en) Vehicle track prediction and driving behavior analysis method
CN113128678A (en) Self-adaptive searching method and device for neural network
CN112215332A (en) Searching method of neural network structure, image processing method and device
WO2022156475A1 (en) Neural network model training method and apparatus, and data processing method and apparatus
CN112732436B (en) Deep reinforcement learning acceleration method of multi-core processor-single graphics processor
Al-Nima et al. Road tracking using deep reinforcement learning for self-driving car applications
CN113537462A (en) Data processing method, neural network quantization method and related device
CN114698395A (en) Quantification method and device of neural network model, and data processing method and device
Asadi et al. Real-time scene segmentation using a light deep neural network architecture for autonomous robot navigation on construction sites
CN114723058A (en) Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis
CN113760511A (en) Vehicle edge calculation task unloading method based on depth certainty strategy
CN115951587A (en) Automatic driving control method, device, equipment, medium and automatic driving vehicle
CN113821270B (en) Task unloading sequence prediction method, decision method, electronic device and storage medium
US20220207327A1 (en) Method for dividing processing capabilities of artificial intelligence between devices and servers in network environment
CN117436485A (en) Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision
CN116915869A (en) Cloud edge cooperation-based time delay sensitive intelligent service quick response method
CN116958910A (en) Attention mechanism-based multi-task traffic scene detection algorithm
CN115208892B (en) Vehicle-road collaborative online task scheduling method and system based on dynamic resource demand

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination