CN114723058A

CN114723058A - Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis

Info

Publication number: CN114723058A
Application number: CN202210369401.9A
Authority: CN
Inventors: 姬晨晨; 于佳耕; 侯朋朋; 邰阳; 苗玉霞; 佟晓宇; 张丽敏; 全雨; 武延军
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2022-07-08

Abstract

The invention discloses a neural network end cloud collaborative reasoning method and device for high sampling rate video stream analysis. Under the actual high-sampling-rate video stream oriented scene, through modeling analysis on the deep neural network, while the inference task is divided into different stages of an edge end/cloud end, a periodic cycle division strategy of the deep neural network model is completed, and the bottleneck of the high-sampling-rate video stream oriented at present is broken through. By the aid of the periodic cycle division strategy, inference waiting time of the edge end/cloud end in the system is reduced under a specified depth neural network model, the ultimate throughput of the system is achieved, and computing capabilities of the edge end and the cloud end are fully utilized.

Description

Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis

Technical Field

The invention belongs to the technical field of deep neural network end cloud collaborative reasoning acceleration optimization, and particularly relates to a neural network end cloud collaborative reasoning method for high sampling rate video stream analysis.

Background

Recent advances in Deep Neural Networks (DNNs) have greatly increased the accuracy and speed of computer vision and video analysis, creating a new avenue for a new generation of intelligent applications. The maturity of cloud computing, equipped with powerful hardware such as TPU, GPU, etc., has become a typical choice for such compute-intensive DNN tasks. For example, in an automotive application, a camera constantly monitors the surrounding scene and transmits it to a server, which then performs video analysis and feeds back control signals to pedals and the steering wheel. In one augmented reality application, the smart glass keeps track of its current view and streams the information to the cloud server, which performs object recognition and sends back contextual augmented tags for seamless display on the actual scene.

The DNN network consists of multiple network layers, and the video is inferred by DNN operating on each frame separately using a feed forward algorithm. The algorithm starts at the input layer and progresses from layer to layer. Each layer receives as input the output of a previous layer, performs a series of calculations on the input data to obtain an output, and provides its output to a subsequent layer. This process terminates once the computation of the output layer is complete. However, the inference process of DNN is that data is generated at the edges and data frames enter the DNN as raw inputs. The computation of each layer in the DNN may be performed at the edge or in the cloud. The computing layer of the edge device does not need to transmit data to the cloud, but the amount of computation and the computation time increase due to limited device resources. The cloud computing layer can reduce the amount of computation, but transmission delay is generated in the process of transmitting data from the edge device to the cloud, and how to deduce DNN at the edge end and the cloud end to achieve the shortest time delay or the maximum throughput is a main problem at present.

Because the data size of some intermediate DNN layers is obviously smaller than the original input data, most of the existing neural network end cloud collaborative reasoning methods divide a deep neural network model, establish a mathematical model by the sum of divided total time, and solve the optimal value of the mathematical model, wherein the optimal solutions are established under the condition of low system load, namely the input rate of a data stream meets certain system requirements, and if the input rate is too large, the input rate is required to be reduced. The limitation of the (sampling rate) condition obviously does not conform to practical application scenes, such as whether vehicles are legal or not and the time of entering and leaving needs to be recorded at an expressway entrance and an expressway toll station, and the license plate recognition system greatly improves the toll collection efficiency and communication cost. When the traffic flow is large, if the system is not adjusted, the department license plate can pass the station without being identified, so that the system is disordered; if the rate of entrance and exit is limited, vehicles at the entrance and exit are accumulated, and traffic accidents and blockages are likely to occur. The neural network end cloud collaborative reasoning method oriented to high sampling rate video stream analysis can improve the throughput of the system to the maximum extent under the condition of not reducing the sampling rate, and meets the requirements on reasoning instantaneity and the application of a real scene.

Disclosure of Invention

The invention aims to provide a neural network end cloud collaborative reasoning method and device for high sampling rate video stream analysis aiming at the defects of the prior art.

The invention relates to a neural network end cloud collaborative reasoning method oriented to high sampling rate video stream analysis, which comprises the following steps:

step one, under a neural network architecture scene facing high sampling rate video stream analysis, a deep learning model is modeled into a chain graph (line), wherein the top point of the chain graph represents a model layer of the neural network, and an arrow of the chain graph represents data transmission between the model layers of the neural network.

And secondly, dividing any two adjacent nodes in the chain graph, wherein the dividing edge is marked as t, the left end of the node at t shows that the node performs inference at the edge end, and the right end of the node at t shows that the node performs inference at the cloud end. The split edge t represents the time required for transmitting the left end point data to the right end point data of the split edge, and is also the time required for transmitting the data from the edge end to the cloud end.

And step three, evaluating the hierarchical time delay and the energy consumption of the neural network according to the hardware environment and the data set of the edge end and the cloud end for different partition edges t, recording the time delay and the data volume of processing tasks of the edge end and the cloud end according to the partition of the corresponding neural network model (namely the partition in the step two), and recording the time delay required by data transmission between the edge end and the cloud end according to the bandwidth of the network and the size of the transmission data.

And step four, the edge end and the cloud end give an optimal periodic cycle division strategy according to the current sampling rate Q, the bandwidth B and the deep learning network model and the model division algorithm, namely the edge end and the cloud end switch to the corresponding sequential division edges when each frame of video stream arrives according to the periodic cycle division strategy, and the system reaches the ultimate throughput of the neural network model under the condition of high sampling rate.

And step five, the edge end receives the first frame of video frame, deduces tasks before edge division for the input first frame of video frame according to a first division strategy in the periodic cycle division strategy to obtain a first reasoning result, sends the first reasoning result to the cloud end, switches the edge end into a second division strategy after finishing, deduces the input second frame of video, and the like.

Step six, the cloud end takes a first reasoning result sent by the edge end as input, completes a subsequent reasoning process according to a first division strategy, and returns a reasoning result of a first frame; and after the second partition strategy is completed, the cloud end is switched to the second partition strategy, a second inference result of the edge end is waited, and the like.

And step seven, finishing inputting the k frames of video streams, outputting all inference results by the system, and finishing the inference.

Further, in the step two, the dividing edge t traverses edges among all the model layers, the deep learning model layers have m layers, the dividing edge t has m-1, and t belongs to { t ∈ [ t ]₁,…,t_m-1T is more than or equal to 1, n layers of model layers needing inference at the edge end are arranged under each division edge T, the size of data output by the edge end is s, the data are transmitted to the cloud end as transmission data under a certain network bandwidth B, and the transmission time T is_ts/B. The size of input data received by the cloud is s, and the model layers needing reasoning of the cloud are m-n layers in total.

Further, in the third step, the processing time T of the deep learning model layer at the edge end and the cloud end_eAnd T_cThe reality is obtained by actual measurement or simulation model prediction; time T of data between deep learning model layers transmitted between edge end and cloud end_tThe method is obtained through actual measurement, or through detecting the network bandwidth between the edge end and the cloud end and calculating the ratio of the size of data output by the edge end to the bandwidth.

Further, the total time of the subtasks of the edge-side inference is T_e，

The total time of the subtasks of the cloud reasoning is

Wherein T is_iThe inference time of the ith layer of the deep neural network and the time of data transmission between the edge end and the cloud end are T_t＝s/B。

Further, when the sampling rate satisfies 1/Q>minmax{T_e,T_t,T_cWhen it is determined, the shortest total time T of a single frame is determinedPartitioning strategy, i.e. according to min (T)_e+T_t+T_c) Determining a partitioning strategy to maximize system throughput; when in high load mode, i.e. 1/Q<minmax{T_e,T_t,T_cAnd at this time, the optimal periodic cycle division strategy is adopted to divide the deep learning model.

Further, in the fourth step, the input of the model partitioning algorithm is (m-1) x 3-dimensional vector, sampling rate Q and bandwidth B, and the output is the partitioning strategy of the neural network model. In the input (m-1) x 3-dimensional vector, each row vector represents { T under a division edge T_e,T_t,T_cFor a deep learning network model with m layers, there are m-1 partition edges. Input ═ T_e1,T_r1,T_c1；T_e2,T_t2,T_c2；…；T_e(m-1),T_t(m-1),T_c(m-1)}. The sampling rate is Q, which may represent the maximum value of the theoretical throughput, i.e. the number of pictures that are available for transmission in 1 second. For the case of n-frame video stream, 1/Q is adopted in parallel mode when the system is in low load mode>minmax{T_e,T_t,T_cWhere minmax { T }_e,T_t,T_cDenotes three times T under each partition strategy_e,T_t,T_cThe time occupying the maximum is minimized, in which case parallel operation in the system does not result in latency, which is based on min (T)_e+T_t+T_c) Determining a partitioning strategy sigma, i.e. a low load, based on the shortest total time T of a single frame, where T is T_e+T_t+T_c. At this time, the maximum limit throughput that the system can reach is greater than the maximum value of the theoretical throughput of the sampling rate Q, and at this time, the maximum system throughput can be satisfied by adopting the partitioning strategy σ. When the system is in high load mode, 1/Q<minmax{T_e,T_t,T_cAt this time, the maximum limit throughput of the whole system adopting the single-frame division strategy sigma is approximately equal to n/(n × minmax { T }_e,T_t,T_c}+T₁+T₂) Wherein T is₁、T₂Is max { T_e,T_t,T_cTwo more of them. E.g., max { T }_e,T_t,T_cIs T_eThen T is₁、T₂Is T_t,T_c. When T is too large or n is large, the above equation is equal to about 1/(min max { T }_e,T_t,T_cAnd) }), it can be seen that the maximum limit throughput according to the single-frame partitioning strategy is less than the theoretical maximum throughput reached by the sampling rate Q, and the theoretical maximum throughput of the sampling rate Q is not exceeded anyway according to the single-frame partitioning strategy.

Further, the division policy that the total time T of the single frame is shortest is not necessarily the optimal division policy in the case of multiple frames. In the prior art, all operations are performed on the same partition point, the optimal pipeline execution partition node is found, and the waiting time still exists, so that the optimizable space still exists. The model division algorithm of the invention works out a periodic cyclic model division strategy, and different model division strategies sigma are adopted for each frame_iThe method is obtained by the periodic cycle division strategy algorithm.

Further, the optimal cyclic partitioning strategy σ ═ σ { (σ ═ σ)₁,σ₂,…,σ_qIn which σ is₁,σ₂,…,σ_qRespectively representing the 1 st division strategy, the 1 st division strategy and the … … q division strategy. The edge end and the cloud end carry out reasoning according to the cyclic division strategy, and for the input of k frame video stream, the edge end adopts the 1 st division strategy sigma₁After reasoning the 1 st frame video stream, the 1 st reasoning result s is obtained₁Transmitting the data to the cloud end, and switching the partitioning strategy of the data to the 2 nd partitioning strategy sigma₂(ii) a Cloud adopts 1 st partition strategy sigma₁By s₁As input, after reasoning the rest tasks, returning the reasoning result of the 1 st frame video stream, and switching the self partitioning strategy into the 2 nd partitioning strategy sigma₂And repeating the above process; the edge terminal adopts the q-th division strategy sigma_qAfter the q frame video stream is reasoned, the q frame inference result s is obtained_qTransmitting the data to the cloud, and switching the own partitioning strategy into the 1 st partitioning strategy sigma₁(ii) a CloudThe terminal adopts the q-th partition strategy sigma_qWith the qth inference result s_qAs input, after reasoning the rest tasks, returning the reasoning result of the q frame video stream, and switching the self partitioning strategy into the 1 st partitioning strategy sigma₁And a cyclic process is achieved. Therefore, the input of the k frames of video streams from the edge and the cloud end is totally switched by the dividing strategy, wherein the number of times is N ═ k% q-1, and% represents the remainder operation.

Further, for DNN, the number of some intermediate results (output of intermediate layers) is significantly smaller than the number of original input data. For example, the input data size of the mini YOLOv2 is 0.95MB, while the output data size of the middle tier max5 is 0.08MB, a 93% reduction. This provides us with the opportunity to take advantage of the powerful computing power of cloud computing and the proximity of edge computing. Specifically, a portion of the DNN may be computed at the edge side, a small number of intermediate results transmitted into the cloud, and then the left portion computed at the cloud side. The division of DNN constitutes a trade-off between computation and transmission. Dividing on different layers results in different computation times and transmission times. Therefore, an optimal partitioning is desirable.

Based on the same inventive concept, the present invention further provides a neural network-side cloud collaborative inference apparatus oriented to high-sampling-rate video stream analysis, which is an electronic apparatus including a memory and a processor, wherein the memory stores a computer program configured to be executed by the processor, and the computer program includes instructions for executing the above-described method of the present invention.

Compared with the prior art, the invention has the following positive effects:

(1) the invention relates to a neural network end cloud collaborative reasoning method facing high sampling rate video stream analysis, which is characterized in that under the actual high sampling rate video stream scene, through the modeling analysis of a deep neural network, a reasoning task is divided into different stages of an edge end/a cloud end, meanwhile, a periodic cycle division strategy of a deep neural network model is completed, and the bottleneck of the high sampling rate video stream facing at present is broken through;

(2) by the aid of the periodic cycle division strategy, inference waiting time of the edge end/cloud end in the system is reduced under a specified depth neural network model, the ultimate throughput of the system is achieved, and computing capabilities of the edge end and the cloud end are fully utilized.

Drawings

FIG. 1 is a diagram illustrating the total transmission time of AlexNet with different edge segments;

FIG. 2 is a schematic diagram of a high sample rate case;

FIG. 3 is a schematic diagram of a loop partitioning strategy in an embodiment;

FIG. 4 is a diagram illustrating a single partitioning policy in an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in more detail with reference to the accompanying drawings and corresponding embodiments.

As shown in FIG. 1, taking AlexNet network as an example, the segmentation edge t traverses edges between all model layers, the deep learning model layers have 24 layers, the segmentation edge t has 23, t is equal to {1,2,3, …,23}, and t is equal to or greater than 1. Under each division condition, the model layer needing reasoning at the edge end has n layers, and the size of data output by the edge end is s_nUnder a certain network bandwidth B, s_nAs transmission data, transmitting to cloud end for transmission time T_t＝s_nand/B. The size of the input data received by the cloud end is s_nAnd the model layers needing reasoning at the cloud end have m-n layers. Processing time T of deep learning model layer at edge end and cloud end_e、T_cActually obtained through actual measurement or simulation model prediction; time T of data transmission between deep learning model layers between edge end and cloud end_tThe ratio of the size of data output by the edge end to the bandwidth is calculated by actually measuring or detecting the network bandwidth between the edge end and the cloud end.

Through system actual measurement, the total time of subtasks inferred at the edge end is T_e，

The total time of the subtasks of the cloud reasoning is T_c，

The total time of data transmission of the edge end and the cloud end is T_t＝s_tand/B. For the division strategy with the shortest total time T of a single frame, the total time of deep neural network inference is T_{one partition}＝T_e+T_t+T_c。

Experiments were performed in the system built, where the edge server processor model was: intel (R) core (TM) i7-10510U CPU @1.80GHz 2.30GHz, and the model of the cloud server processor is as follows: CPU Intel (R) Xeon (R) Silver 4116CPU @2.10GHz with operating system Ubuntu 18.04. Calculating the total time of deep neural network inference under all the division edges T under the condition of a single frame, and representing the total time in the mode of figure 1, wherein the abscissa represents the division edge T in a certain network layer, the ordinate represents the division edge T under the condition of the division edge T, and the ordinate represents T_{one partition}＝T_e+T_t+T_cThe calculated total time, at this point, gives m-1 ═ 23 partitioning strategies as part of the input to the model partitioning algorithm.

First, given the definition of a high sampling rate, as shown in fig. 2, where frame a, frame B, and frame C represent three video frames in a video stream, when referring to the high sampling rate, one faces necessarily a continuous video stream, t in fig. 2₀、t₁And t₂The arrival of a video frame is represented, namely at a sampling rate Q, a frame of picture is taken at the moment, and the frame of picture is input into a deep neural network for reasoning. The representation of fig. 2 is representative (assuming that the inference time of the edge is greater than the transmission time and the cloud inference time), assuming that the initialized partition strategy is the shortest total time consumption of a single frame picture, and the edge and the cloud always adopt the partition strategy, when a next frame arrives at the edge, if the edge has already inferred the subtask of the previous frame at this time, an upcoming new frame can be immediately inferred, and the sampling rate at this time is referred to as a low sampling rate in the present invention, as shown in the lower part of fig. 2. When the next frame comes, if the edge end does not have the subtask of the previous frame at the moment, the edge end cannot reason the coming new frame immediately, and the new frame at the momentThe inference of a frame necessarily has a waiting time, and the edge end is waited to finish the inference task of the previous frame and then starts the inference, and the sampling rate at this time is called as high sampling rate in the invention, as shown in the upper part of fig. 2. In the high sampling rate state, the waiting time of the edge terminal in the n frames of video streams is T in total_wait＝n*(T _e1/Q), when the video stream is longer, the waiting time is longer, and the transmission process and the cloud reasoning process are in an idle state in the waiting time, so that partial resources are wasted.

The periodic cycle division algorithm for the high sampling rate video stream provided by the invention has the following steps:

1) calculating a total partition strategy sigma of m-1 partition edges to be { sigma₁,σ₂,…,σ_m-1}. Each division strategy sigma corresponds to three times which are respectively edge end processing time T_e(i.e., end-side processing time in fig. 2), transmission time T of data transmitted from edge to cloud_t(i.e., data transmission time in FIG. 2), and time T of cloud processing_c(i.e., cloud processing time in fig. 2), the inference total time is: t ═ T_e+T_t+T_cThe optimization strategy of a single frame in the partitioning strategy is recorded as

Division strategy sigma for expressing shortest total time T of single-frame inference₁This division strategy is set as the 1 st division strategy of the periodic cycle division strategy.

2) Classifying all the division strategies to separate three categories, namely three sets, T_e<T_tAnd T_e<T_c，②T_t<T_eAnd T_t<T_c，③T_c<T_eAnd T_c<T_t. According to σ₁To which set the algorithm is to be inferred, if the set has only sigma₁One element, the result of the periodic cycle division strategy is sigma' ═ { sigma₁And the step of obtaining the periodic cycle division strategy is that the periodic cycle division strategy is constant, and sigma' represents the finally obtained periodic cycle division strategy. Sigma₁The problems belonging to the three cases can be mutually inverted,the following step 3) illustrates the subsequent algorithm by taking the third example.

3) For T_cThe smallest set, i.e., (c), the partitioning strategy is denoted as σ_③＝{σ₁,σ₂,…,σ_rAt this time, the sampling rate 1/Q<minmax{T_e,T_t,T_t}，T_e，T_t，T_cE sigma. Wherein, for the convenience of the formula, T_e>T_tI.e. 1/Q<T_eThe next frame k +1 arrives when the current frame k is not inferred, k starts from 1, T_eAnd T_tThe difference of (A) is the waiting time T_p，T_p＝|T_e-T_tThe latency of the first frame is zero, i.e. no latency is needed. When k is larger than 1, due to the pipeline parallel processing, after the edge end processing finishes the first stage of the current frame k, the first stage of the next frame k +1 can be processed; after the current frame k finishes the first stage, entering a data transmission stage, and after the next frame k +1 finishes the first stage, entering the data transmission stage; however, when the next frame k +1 enters the data transmission stage, it is required to satisfy that the data transmission stage of the current frame k has been processed, that is, the network is in an available state, if the data of the current frame k has not been transmitted, the data transmission of the next frame k +1 needs to wait, and the waiting time T between the next frame and the current frame_p(k+1)＝|T_e(k+1)-T_t(k)|。

4) In order to obtain the optimal periodic cycle division strategy under the current condition, searching is carried out according to the following steps:

i. the set of known partitioning strategies is: sigma_③＝{σ _i1.. r }, the set of the optimal periodic cycle division strategy is as follows: { sigma. }_jQ, where q is ≦ r. The division strategy is sigma obtained according to the shortest total inference time of the single frame_{one partition}This partitioning policy is in the total set of partitioning policies. Will sigma_{one partition}Is used as an initial partitioning strategy and is marked as sigma_tmpAdding the obtained data into an optimal cycle division strategy;

and ii, reasoning the picture coming from the first frame according to the dividing strategy in the step i, and determining the dividing strategy at the momentEdge inference time slightly below:

data transmission time:

when the next frame of picture comes, searching a division strategy set sigma_③In

Closest approach to

As a partitioning policy of the current frame,

if it is not

Exiting the cycle to obtain a final cycle division strategy; otherwise, adding the partitioning strategy into the optimal cycle partitioning strategy and executing the step iv;

update

And repeating the step iii for the next frame of picture.

5) Obtaining an optimal period cycle division strategy sigma-sigma by the cycle result₁,σ₂,…,σ_qThe total time of the video stream from the 1 st to the k-th frame is:

for the shortest partition case of a single frame, the total time of inference of k frames is:

T_{one partition}＝k*T_e+T_t+T_c

the final partitioning strategy is:

taking a specific example for illustration, the sampling interval in this example is: 1/Q is 10 ms. The partitioning results run time for this neural network is shown in table 1:

table 1 shows the time of three phases under 4 division strategies

Partitioning strategy	T_e(ms)	T_t(ms)	T_c(ms)
				σ₁	30	20	10
σ ₂	20	30	15
				σ₃	40	25	8
σ ₄	10	15	40

The following describes the algorithm in detail according to the contents of table 1:

1) calculating the edge end processing time T of 4 division strategies according to the algorithm step 1_eAnd the transmission time T of data from the edge end to the cloud end_tAnd time T of cloud processing_c；

2) Partition strategy sigma₁σ₂σ₃Satisfy T_c<T_eAnd T_c<T_tIs T_cSearching an algorithm in the minimum set to find an optimal periodic cycle division strategy;

3) the sampling interval meets the condition that 1/Q is 10ms <20ms <30ms, namely, the next frame arrives when the current frame is processed at any moment;

4) go through to find the minimum value of T

At this time, the corresponding strategy is that the first frame division strategy is sigma₁；

T_p(2)＝minT_p(j)＝min{|T_e(j)-T_t(1)|}＝|T_e(j)-T_t(1)|_j＝20ms, the second frame partition strategy is σ₂；

T_p(3)＝minT_p(j)＝min{|T_e(j)-T_t(2)|}＝|T_e(j)-T_t(2)|_j＝10ms, the third frame division strategy is σ₃；

In this case, T is set to 1 or 3_e(a)＝＝T_e(b) And the cycle is skipped.

5)σ＝{σ₁,σ₂And the optimal periodic cycle partitioning strategy comprises q partitioning methods, namely q is 2. And finally, the division strategy frequency of system switching in the calculation process is N-k% q-1.

The schematic diagram of the total time of the cyclic partition strategy is shown in fig. 3, and the total time is expressed as follows:

the diagram of the total time of the shortest division of the single division point is shown in fig. 4, and the total time is expressed as follows:

T_{one partition}＝k*_Te+T_t+T_c＝30k+30

when k is 99, the total time inferred according to the periodic cycle division strategy is 2510ms, and the total time inferred according to the division strategy with the shortest time of a single frame is 3000 ms; when k is 100, the total time inferred according to the periodic cycle division strategy is 2545ms, and the total time inferred according to the single-frame shortest time division strategy is 3030 ms. The periodic cycle division algorithm of the invention is superior to the division algorithm with the shortest time of a single frame.

To obtain this, if k is odd, T ≦ T_{one partition}The value range of k is obtained, wherein k is more than or equal to 1; when k is even number, T is less than or equal to T_{one partition}And the value range of k is obtained, wherein k is more than or equal to 3.

Under the scene of high-sampling-rate video stream analysis, the periodic cycle division strategy neural network segment cloud collaborative reasoning method provided by the invention improves the reasoning speed.

Based on the same inventive concept, another embodiment of the present invention provides a neural network-side cloud collaborative inference apparatus oriented to high-sampling-rate video stream analysis, which is an electronic apparatus (computer, server, smartphone, etc.) including a memory and a processor, wherein the memory stores a computer program configured to be executed by the processor, and the computer program includes instructions for executing steps in the method of the present invention.

Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.

It should be noted that the above-described embodiments are only a part of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without any creative effort, shall fall within the protection scope of the present invention.

Claims

1. A neural network end cloud collaborative reasoning method oriented to high sampling rate video stream analysis is characterized by comprising the following steps:

under the neural network architecture scene facing high sampling rate video stream analysis, a deep learning model is modeled into a chain graph, the top point of the chain graph represents the model layer of the neural network, and the arrow of the chain graph represents data transmission between the model layers of the neural network;

dividing any two adjacent nodes in the chain graph, wherein the dividing edge is marked as t, the left end of the node at t represents that the node carries out reasoning at the edge end, and the right end of the node at t represents that the node carries out reasoning at the cloud end;

for different partition edges t, recording time delay and data volume of processing tasks of the edge end and the cloud end, and recording time delay required by data transmission between the edge end and the cloud end according to the bandwidth of a network and the size of transmission data;

the edge end and the cloud end give an optimal periodic cycle division strategy through a model division algorithm according to the current sampling rate, the bandwidth and the deep learning model;

the method comprises the steps that a first frame of video frames are received by an edge end, tasks before edge division are inferred for the input first frame of video frames according to a first division strategy in a periodic cycle division strategy to obtain a first inference result, the first inference result is sent to a cloud end, then the edge end is switched to a second division strategy, inference is conducted on the input second frame of video frames, and the like;

the cloud end takes a first reasoning result sent by the edge end as input, completes a subsequent reasoning process according to a first division strategy and returns a reasoning result of a first frame;

and the cloud end is switched into a second division strategy, a second inference result of the edge end is waited, and the like until the input video stream is ended, and all inference results are output.

2. The method according to claim 1, wherein the deep learning model layer has m layers, the number of the division edges t is m-1, the input of the model division algorithm is (m-1) x 3-dimensional vector, the sampling rate Q and the bandwidth B, and the output is the division strategy of the neural network model; in the input (m-1) x 3-dimensional vector, each row vector represents { T under a division edge T_e,T_t,T_cIn which T is_eIs the processing time, T, of the model layer of deep learning at the edge end_cIs the processing time, T, of the model layer of deep learning at the cloud_tThe time of data transmission between the deeply learned model layers at the edge end and the cloud end is used.

3. The method of claim 2, wherein the sampling rate satisfies 1/Q>minmax{T_e,T_t,T_cWhen it is, the division strategy is determined according to the shortest total time T of a single frame, namely according to min (T)_e+T_t+T_c) Determining a partitioning strategy to maximize system throughput; when in high load mode, i.e. 1/Q<minmax{T_e,T_t,T_cAnd at this time, the optimal periodic cycle division strategy is adopted to divide the deep learning model.

4. The method of claim 3, wherein the model partitioning algorithm comprises the steps of:

calculating a total partition strategy sigma of m-1 partition edges to be { sigma₁,σ₂,…,σ_m-1And each division strategy corresponds to three times, namely T_e,T_t,T_cThe total reasoning time is as follows: t ═ T_e+T_t+T_cThe optimization strategy of a single frame in all partitioning strategies is recorded as

Division strategy sigma for expressing shortest total time T of single-frame inference₁Setting the division strategy as a 1 st division strategy of a periodic cycle division strategy;

classifying all the division strategies to separate three categories, namely three sets, T_e<T_tAnd T_e<T_c，②T_t<T_eAnd T_t<T_c，③T_c<T_eAnd T_c<T_t(ii) a According to σ₁To which set the algorithm is to be inferred, if the set has only sigma₁One element, the result of the periodic cycle division strategy is sigma' ═ { sigma₁And the step of obtaining the periodic cycle division strategy is that the periodic cycle division strategy is constant, and sigma' represents the finally obtained periodic cycle division strategy.

5. The method according to claim 4, characterized in that for set (c), the following steps are used to obtain an optimal periodic cycle partitioning strategy:

the set of known partitioning strategies is: sigma_③＝{σ_i1.. r }, and the optimal set of the periodic cycle division strategy is as follows: { sigma. }_jQ, wherein q is less than or equal to r; the division strategy is sigma obtained according to the shortest total inference time of the single frame_{one partition}Will σ_{one partition}Is used as an initial partitioning strategy and is marked as sigma_tmpAdding the obtained data into an optimal periodic cycle division strategy;

dividing a picture coming from a first frame according to a step division strategy sigma_tmpReasoning is carried out, and edge reasoning time under the partitioning strategy is determined

And data transmission time

When the next frame of picture comes, a division strategy set sigma is found_③In

Closest approach to

As a partitioning policy of the current frame,

if it is not

updating

Repeating the steps for the next frame of picture to finally obtain the optimal periodic cycle division strategy sigma-sigma { sigma }₁,σ₂,…,σ_q}。

6. The method according to claim 1, characterized in that for the high sampling rate video stream, the first division strategy of the optimal periodic cycle division strategy is inferred for the picture coming from the first frame, the second division strategy of the optimal periodic cycle division strategy is inferred for the picture coming from the second frame, the frame stream is processed according to the rule, when the last division strategy of the optimal periodic cycle division strategy is traversed, the cycle is started by returning to the first division strategy, and so on.

7. A neural network-side cloud collaborative reasoning apparatus for high-sampling-rate video stream analysis, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any one of claims 1 to 6.

8. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 6.