CN109272035A

CN109272035A - A kind of video rapid inference method based on circulation residual error module

Info

Publication number: CN109272035A
Application number: CN201811059298.8A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2018-09-12
Filing date: 2018-09-12
Publication date: 2019-01-25

Abstract

A kind of video rapid inference method based on circulation residual error module proposed in the present invention, its main contents includes: circulation residual error module, reduces neural network computing complexity, improves intermediate features mapping sparsity, synthesizing efficient inference engines, its process is, frame similitude first is utilized in circulation residual error module, reduces the redundant computation in video convolutional neural networks deduction frame by frame；Then, it is exported by improving the sparsity approximate inference of intermediate features mapping, thus the precision for accelerating the speed inferred, and being inferred by error control mechanism guarantee；Finally, synthesizing efficient inference engines make to recycle the operation of residual error modular high-performance that is, using the accelerator of the dynamic sparsity in matrix-vector multiplication.The present invention can significantly improve the speed of service of vision processing system relative to conventional method, and enhance network to the understandability of the video clip of real-time change, to realize the acceleration for inferring process to video under the premise of guaranteeing to identify accuracy.

Description

A kind of video rapid inference method based on circulation residual error module

Technical field

The present invention relates to video identification fields, more particularly, to a kind of video rapid inference based on circulation residual error module Method.

Background technique

As artificial intelligence field achieves quantum jump, the development of video identification technology has also obtained great promotion；Depending on The working principle of frequency identification technology is mainly: head end video acquisition camera provides steady and audible vision signal, then passes through Between the intelligent analysis module that is embedded in, video pictures are identified, detected, are analyzed, interference is filtered out, to the exception in video pictures Situation does target and track label.In terms of safety management, video identification technology can be used as monitoring system, to the people of specific region Member carries out effective monitoring and identification, to find criminal offence in time and be tracked to suspect；In human-computer interaction, Video identification technology can be used to identify posture, movement, gesture of people etc. to understand the intention of people；In military field, video identification Technology can be used to identify the dynamic change of unfriendly target, to realize precision strike；In Hainan Airlines field, video identification technology is available In judging tide ocean current situation, with the reasonable course line of determination；In addition, video identification technology is also extensively used for abnormal behaviour inspection The fields such as survey, camera function inspection and virtual reality；However, current video estimating method there is infer speed compared with Slowly, the problems such as precision is not high.

A kind of video rapid inference method based on circulation residual error module proposed in the present invention, first in circulation residual error module It is middle to utilize frame similitude, reduce the redundant computation in video convolutional neural networks deduction frame by frame；Then, by improving intermediate features The sparsity approximate inference of mapping exports, to accelerate the speed inferred, and guarantees the precision of deduction by error control mechanism； Finally, synthesizing efficient inference engines make to recycle residual error module that is, using the accelerator of the dynamic sparsity in matrix-vector multiplication Efficient operation.The present invention can significantly improve the speed of service of vision processing system relative to conventional method, and enhance network pair The understandability of the video clip of real-time change infers process to video under the premise of guaranteeing to identify accuracy to realize Accelerate.

Summary of the invention

For current video estimating method, there is infer the problems such as speed is slower, and precision is not high, the purpose of the present invention It is to provide a kind of video rapid inference method based on circulation residual error module, it is first similar using frame in circulation residual error module Property, reduce the redundant computation in video convolutional neural networks deduction frame by frame；Then, the sparsity mapped by improving intermediate features Approximate inference output to accelerate the speed inferred, and guarantees the precision of deduction by error control mechanism；Finally, synthesis is high Inference engines are imitated, i.e., using the accelerator of the dynamic sparsity in matrix-vector multiplication, make to recycle the operation of residual error modular high-performance.

To solve the above problems, the present invention provides a kind of video rapid inference method based on circulation residual error module, master The content is wanted to include:

(1) residual error module is recycled；

(2) neural network computing complexity is reduced；

(3) intermediate features are improved and maps sparsity；

(4) synthesizing efficient inference engines.

Wherein, the circulation residual error module, mainly by the similitude feature using video consecutive frame to accelerate video Infer；The deduction of consecutive frame is carried out based on convolutional neural networks, and primary restricted factor is the line in convolutional neural networks Property layer in numerous and jumbled calculating process, and recycle residual error module have share linear layer between overlapping operation function, greatly reduce Therefore the calculating time, first improves the sparsity of each linear layer input tensor, then so that improving video infers speed To transmittance process before being further speeded up using sparse matrix multiplication accelerator.

Further, the acceleration video is inferred, video convolutional neural networks frame by frame are reduced using frame similitude and are pushed away Redundant computation in disconnected；The deduction of video convolutional neural networks is indicated with following formula frame by frame:

Wherein,For depth convolutional neural networks feature；It is the network for handling difference between consecutive frame；I indicates defeated Enter tensor；I_tAnd I_t-1For the input tensor of t frame and the t-1 frame.

Further, the input tensor, the input tensor being transferred in linear layer l and its corresponding projection layer P will It can be saved, input tensor includes previous frame and networkRelevant information, these information will be in the deduction phase quilt of next frame It uses.

Further, the overlapping operation between the sharing linear layer, linear layer include convolutional layer and are fully connected layer, Its principle is indicated with following formula:

Wherein, l indicates first of linear layer；T indicates t-th of frame, Δ I_tlIndicate that t frame l linear layer inputs tensor I_tl Tensor I is inputted with the l linear layer on the t-1 frame_(t-1)lDifference, i.e. Δ I_tl=I_tl-I_(t-1)l；P indicates projection layer；F is indicated Line rectification function；F is the weight filter of convolutional layer；W is the weight tensor for being fully connected layer；* convolution algorithm is indicated；FC is It is fully connected layer；P_(t-1)lIt has been obtained in the deduction phase of last frame, so that the major part that the formula calculates is F_l*Δ I_tlAnd W_lΔI_tl；Due to the similitude of consecutive frame, Δ I_tlSparsity with height.

Further, the convolution, including intensive convolution sum sparse convolution, intensive convolution, that is, Standard convolution, sparse volume Product is the convolution with sparse matrix multiplication accelerator；Wherein, sparse convolution shares the same power with intensive convolution without bias term Heavy filtration device.

Wherein, the reduction neural network computing complexity is shared linearly in circulation residual error module even if using Overlapping operation between layer, computational complexity is still higher, and uses the sparsity of input then to reduce network and execute the time With operation cost, to reduce computational complexity；Wherein, sparsity can accelerate the convolutional network in training and test；Mainly By skipping neutral element operation with sparse matrix multiplication accelerator in calculating process, to improve deduction speed；Recycle residual error The computational complexity of module depends mainly on the complexity of multiplying, and multiplying complexity is calculated with following formula:

Wherein, O indicates computational complexity；W and H is convolutional layer size；W, H, w and h indicate the size of convolution；C indicates volume Product channel；ρ indicates the density of input tensor；In and out expression is output and input；c_iIndicate i-th of convolutional layer；f_jIt indicates j-th Complete convolution.

Wherein, the raising intermediate features map sparsity, and the sparsity approximation by improving intermediate features mapping pushes away Disconnected output, to further speed up the speed of deduction；This process will lead to the error accumulation of output, need to guarantee that video quickly pushes away Disconnected accuracy estimates accumulated error using following formula:

Wherein, e_cRepresent accurate accumulated error；Fourth order polynomial regression function is represented, it is by accumulation cutoff value and to tire out What the mass data of product error obtained fitting；U represents truncation mapping.

Further, the accuracy of the guarantee video rapid inference, steps are as follows for main execution: for each video Segment first extracts the VGG-16 feature vector of its every frame, and wherein VGG-16 is a kind of convolutional neural networks；Then, in these spies It levies and executes average pond on vector, indicate this video to obtain the videl stage feature vector in 4096 dimensions；Finally, passing through These video features, two layers of training one of perceptron identify the movement in video and assess its precision in real time, when accumulation misses When difference is more than the threshold value ∈ of setting, network will do it new deduction to remove error, guarantee identification accuracy to realize Under the premise of to video infer process acceleration；Wherein, threshold value ∈ is the value for measuring accumulated error, true according to deduction required precision It is fixed.

Wherein, the synthesizing efficient inference engines, efficient inference engines are dilute using the dynamic in matrix-vector multiplication The accelerator for dredging property is mainly used for recycling the efficient operation of residual error module, its working principle is that: when in matrix W and sparse spike a Between execute multiplication when, sparse spike a and a non-zero detection node search next nonzero element a in a recursive manner_j；It looks for To after next nonzero element, synthesizing efficient inference engines are by a_jAnd its corresponding index j is traveled in processing element；It connects down Come, with the weight column W of index j in all processing elements_jIt will be multiplied by a_j, and result will be added to corresponding row accumulator In, these accumulator final output vectors b；Matrix multiplication operation can be analyzed to several matrix-vector multiplication operations, thus It is coefficient of dynamics vector by decomposing input tensor, so that efficiently inference engines are advantageously in embedding cycle residual error module.

Detailed description of the invention

Fig. 1 is a kind of system framework figure of the video rapid inference method based on circulation residual error module of the present invention.

Fig. 2 is a kind of circulation residual error module work of the video rapid inference method based on circulation residual error module of the present invention Figure.

Fig. 3 is a kind for the treatment of effect figure of the video rapid inference method based on circulation residual error module of the present invention.

Specific embodiment

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.

Fig. 1 is a kind of system framework figure of the video rapid inference method based on circulation residual error module of the present invention.Main packet Circulation residual error module, reduction neural network computing complexity, raising intermediate features mapping sparsity and synthesizing efficient deduction is included to draw It holds up.

Neural network computing complexity is reduced, in circulation residual error module, even if using the weight shared between linear layer Folded operation, computational complexity is still higher, and uses the sparsity of input then to reduce network and execute time and operation cost, To reduce computational complexity；Wherein, sparsity can accelerate the convolutional network in training and test；Mainly by operation Cheng Zhong skips neutral element operation with sparse matrix multiplication accelerator, to improve deduction speed；The operation for recycling residual error module is multiple Polygamy depends mainly on the complexity of multiplying, and multiplying complexity is calculated with following formula:

It improving intermediate features and maps sparsity, the sparsity approximate inference by improving intermediate features mapping exports, thus Further speed up the speed of deduction；This process will lead to the error accumulation of output, need to guarantee the accuracy of video rapid inference, Accumulated error is estimated using following formula:

Wherein, guarantee the accuracy of video rapid inference, steps are as follows for main execution: for each video clip, first mentioning The VGG-16 feature vector of its every frame is taken, wherein VGG-16 is a kind of convolutional neural networks；Then, it is held in these feature vectors The average pond of row, indicates this video to obtain the videl stage feature vector in 4096 dimensions；Finally, special by these videos Sign, two layers of training one of perceptron identify the movement in video and assess its precision in real time, when accumulated error is more than setting Threshold value ∈ when, network will do it new deduction to remove error, thus realize guarantee identify accuracy under the premise of to view Frequency infers the acceleration of process；Wherein, threshold value ∈ is the value for measuring accumulated error, is determined according to deduction required precision.

Synthesizing efficient inference engines, efficient inference engines are the acceleration using the dynamic sparsity in matrix-vector multiplication Device is mainly used for recycling the efficient operation of residual error module, its working principle is that: multiply when being executed between matrix W and sparse spike a When method, sparse spike a and a non-zero detection node search next nonzero element a in a recursive manner_j；It finds next non- After neutral element, synthesizing efficient inference engines are by a_jAnd its corresponding index j is traveled in processing element；Next, at all places Manage the weight column W in element with index j_jIt will be multiplied by a_j, and result will be added in corresponding row accumulator, and these are cumulative Device final output vector b；Matrix multiplication operation can be analyzed to several matrix-vector multiplication operations, input from there through decomposing Tensor is coefficient of dynamics vector, so that efficiently inference engines are advantageously in embedding cycle residual error module.

Fig. 2 is a kind of circulation residual error module work of the video rapid inference method based on circulation residual error module of the present invention Figure.This figure mainly shows the working principle of circulation residual error module: by the similitude feature using video consecutive frame to accelerate Video is inferred, i.e., first improves the sparsity of each linear layer input tensor, then further using sparse matrix multiplication accelerator To transmittance process before accelerating.

Residual error module is recycled, mainly by the similitude feature using video consecutive frame to accelerate video to infer；Consecutive frame Deduction be to be carried out based on convolutional neural networks, primary restricted factor is numerous and jumbled in the linear layer in convolutional neural networks Calculating process, and recycling residual error module has the function of sharing the operation of the overlapping between linear layer, greatly reduces and calculates the time, from And improve video and infer speed, therefore, first improve the sparsity of each linear layer input tensor, is then multiplied using sparse matrix Method accelerator is accelerated preceding to transmittance process.

Wherein, accelerate video to infer, the redundancy in video convolutional neural networks deduction frame by frame is reduced using frame similitude It calculates；The deduction of video convolutional neural networks is indicated with following formula frame by frame:

Wherein, tensor is inputted, the input tensor being transferred in linear layer l and its corresponding projection layer P will be saved, defeated Entering tensor includes previous frame and networkRelevant information, these information will be used in the deduction phase of next frame.

Wherein, share the overlapping operation between linear layer, linear layer includes convolutional layer and is fully connected layer, under principle is used Formula indicates:

Wherein, convolution, including intensive convolution sum sparse convolution, intensive convolution, that is, Standard convolution, sparse convolution are with dilute Dredge the convolution of matrix multiplier accelerator；Wherein, sparse convolution shares the same weight filter with intensive convolution without bias term.

Fig. 3 is a kind for the treatment of effect figure of the video rapid inference method based on circulation residual error module of the present invention.The present invention Relative to conventional method, the speed of service of vision processing system can be significantly improved, and enhances network to the video of real-time change The understandability of segment, to realize the acceleration for inferring process to video under the premise of guaranteeing to identify accuracy.

For those skilled in the art, the present invention is not limited to the details of above-described embodiment, without departing substantially from essence of the invention In the case where mind and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as of the invention Protection scope.Therefore, it includes preferred embodiment and all changes for falling into the scope of the invention that the following claims are intended to be interpreted as More and modify.

Claims

1. a kind of video rapid inference method based on circulation residual error module, which is characterized in that main includes circulation residual error module (1)；It reduces neural network computing complexity (two)；Improve intermediate features mapping sparsity (three)；Synthesizing efficient inference engines (4).

2. based on circulation residual error module (one) described in claims 1, which is characterized in that mainly by utilizing video consecutive frame Similitude feature to accelerate video to infer；The deduction of consecutive frame is carried out based on convolutional neural networks, it is primary it is restricted because Element is the numerous and jumbled calculating process in the linear layer in convolutional neural networks, and recycling residual error module has the weight shared between linear layer The function of folded operation, greatly reduces and calculates the time, so that improving video infers therefore speed first improves each linear layer The sparsity of tensor is inputted, to transmittance process before then further speeding up using sparse matrix multiplication accelerator.

3. being inferred based on acceleration video described in claims 2, which is characterized in that reduce video frame by frame using frame similitude Redundant computation in convolutional neural networks deduction；The deduction of video convolutional neural networks is indicated with following formula frame by frame:

Wherein,For depth convolutional neural networks feature；It is the network for handling difference between consecutive frame；I indicates input Amount；I_tAnd I_t-1For the input tensor of t frame and the t-1 frame.

4. based on input tensor described in claims 3, which is characterized in that be transferred to linear layer l and its corresponding projection layer P In input tensor will be saved, input tensor include previous frame and networkRelevant information, these information will be under The deduction phase of one frame is used.

5. based on the overlapping operation between sharing linear layer described in claims 2, which is characterized in that linear layer includes convolution Layer and be fully connected layer, principle is indicated with following formula:

Wherein, l indicates first of linear layer；T indicates t-th of frame, Δ I_tlIndicate that t frame l linear layer inputs tensor I_tlWith t- L linear layer on 1 frame inputs tensor I_(t-1)lDifference, i.e. Δ I_tl=I_tl-I_(t-1)l；P indicates projection layer；F indicates linear whole Stream function；F is the weight filter of convolutional layer；W is the weight tensor for being fully connected layer；* convolution algorithm is indicated；FC is to connect completely Connect layer；P_(t-1)lIt has been obtained in the deduction phase of last frame, so that the major part that the formula calculates is F_l*ΔI_tlAnd W_l ΔI_tl；Due to the similitude of consecutive frame, Δ I_tlSparsity with height.

6. based on convolution described in claims 5, which is characterized in that including intensive convolution sum sparse convolution, intensive convolution is Standard convolution, sparse convolution are the convolution with sparse matrix multiplication accelerator；Wherein, sparse convolution is without bias term, and intensive Convolution shares the same weight filter.

7. based on reduction neural network computing complexity (two) described in claims 1, which is characterized in that in circulation residual error mould In block, even if using the overlapping operation shared between linear layer, computational complexity is still higher, and uses the sparse of input Property then reduce network and execute time and operation cost, to reduce computational complexity；Wherein, sparsity can accelerate training and survey Convolutional network in examination；Mainly by skipping neutral element operation with sparse matrix multiplication accelerator in calculating process, thus It improves and infers speed；The computational complexity of circulation residual error module depends mainly on the complexity of multiplying, and multiplying is complicated Property is calculated with following formula:

Wherein, O indicates computational complexity；W and H is convolutional layer size；W, H, w and h indicate the size of convolution；C indicates that convolution is logical Road；ρ indicates the density of input tensor；In and out expression is output and input；c_iIndicate i-th of convolutional layer；f_jJ-th of expression complete Convolution.

8. mapping sparsity (three) based on raising intermediate features described in claims 1, which is characterized in that intermediate by improving The sparsity approximate inference of Feature Mapping exports, to further speed up the speed of deduction；This process will lead to the error of output Accumulation, is needed to guarantee the accuracy of video rapid inference, is estimated using following formula accumulated error:

Wherein, e_cRepresent accurate accumulated error；Fourth order polynomial regression function is represented, it is missed by accumulation cutoff value and accumulation The mass data of difference obtains fitting；U represents truncation mapping.

9. based on the accuracy for guaranteeing video rapid inference described in claims 8, which is characterized in that main to execute step such as Under: for each video clip, the VGG-16 feature vector of its every frame is first extracted, wherein VGG-16 is a kind of convolutional Neural net Network；Then, average pond is executed in these feature vectors, is indicated with obtaining the videl stage feature vector in 4096 dimensions This video；Finally, by these video features, two layers of training one of perceptron identifies the movement in video and assesses in real time Its precision, when accumulated error is more than the threshold value ∈ of setting, network will do it new deduction to remove error, protect to realize The acceleration of process is inferred under the premise of card identification accuracy to video；Wherein threshold value ∈ is the value for measuring accumulated error, according to pushing away Disconnected required precision determines.

10. based on synthesizing efficient inference engines (four) described in claims 1, which is characterized in that efficient inference engines are benefits With the accelerator of the dynamic sparsity in matrix-vector multiplication, it is mainly used for recycling the efficient operation of residual error module, work is former Reason are as follows: when executing multiplication between matrix W and sparse spike a, sparse spike a and a non-zero detection node are with recursive side Formula searches next nonzero element a_j；After finding next nonzero element, synthesizing efficient inference engines are by a_jAnd its corresponding index J is traveled in processing element；Next, with the weight column W of index j in all processing elements_jIt will be multiplied by a_j, and result It will be added in corresponding row accumulator, these accumulator final output vectors b；Matrix multiplication operation can be analyzed to several Matrix-vector multiplication operation is coefficient of dynamics vector from there through input tensor is decomposed, so that efficiently inference engines are advantageously embedding Enter to recycle in residual error module.