CN113610209A - Neural network model reasoning acceleration method for monitoring video stream scene - Google Patents
Neural network model reasoning acceleration method for monitoring video stream scene Download PDFInfo
- Publication number
- CN113610209A CN113610209A CN202110911956.7A CN202110911956A CN113610209A CN 113610209 A CN113610209 A CN 113610209A CN 202110911956 A CN202110911956 A CN 202110911956A CN 113610209 A CN113610209 A CN 113610209A
- Authority
- CN
- China
- Prior art keywords
- neural network
- module
- data
- video stream
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000003062 neural network model Methods 0.000 title claims abstract description 32
- 238000012544 monitoring process Methods 0.000 title claims abstract description 17
- 230000001133 acceleration Effects 0.000 title claims abstract description 12
- 238000013528 artificial neural network Methods 0.000 claims abstract description 44
- 230000008569 process Effects 0.000 claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000005516 engineering process Methods 0.000 claims abstract description 5
- 230000005540 biological transmission Effects 0.000 claims abstract description 3
- 230000011218 segmentation Effects 0.000 claims abstract description 3
- 238000004821 distillation Methods 0.000 description 7
- 230000006835 compression Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 5
- 238000013139 quantization Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
The invention discloses a neural network model reasoning acceleration method facing a monitoring video stream scene, which is characterized in that a neural network hierarchy is taken as a unit, segmentation is carried out according to the neural network reasoning calculation direction, an original neural network model is segmented into a plurality of modules, the output of the former module is taken as the input of the latter module, and the segmented modules are processed in parallel by utilizing a multi-process technology, namely, one module corresponds to one process; input data are independently calculated among all modules of the neural network in a multi-process mode in parallel, and data transmission among module processes is carried out in a message queue mode. The invention relates to the technical field of neural network models, and particularly provides a neural network model reasoning acceleration method for a monitoring video stream scene.
Description
Technical Field
The invention relates to the technical field of neural network models, in particular to a neural network model reasoning acceleration method for a monitoring video stream scene.
Background
With the continuous development and evolution of the neural network, the neural network achieves excellent performance in the field of computer vision by virtue of strong fitting learning capability and exceeds the human level in multiple tasks. Meanwhile, the application of the network camera is popularized due to the arrival of the internet of things era. The network camera provides data information externally in a video stream mode, and the data density is high. Therefore, the real-time performance of the neural network model for processing the video stream is high. The accuracy of the neural network model is improved mainly by optimizing an internal structure, increasing the number of network layers and the like. However, neural network inference latency is also greatly increased as the network hierarchy deepens, which is unacceptable for time-sensitive projects.
Aiming at the problem, the existing work is mainly optimized from two aspects of improving the operation efficiency of neural network operators and model compression. In the aspect of operator operation efficiency, Intel and some scientific research institutions respectively provide own computation libraries MKL and OpenBLAS for linear algebra computation widely involved in neural network computation, and the neural network computation efficiency is improved by reducing the computation complexity of advanced linear algebra computation such as matrix multiplication, inversion and singular value computation.
In terms of model compression, the model compression methods are roughly classified into three methods, i.e., model quantization, pruning, and distillation. The model quantization is mainly realized by compressing floating point 32-bit variables commonly used in neural network calculation to a floating point 16-bit or integer type according to a certain rule, but the floating point 32-bit matrix operation optimization scheme is mature, and the model quantization can only compress the model volume and cannot effectively increase the model operation speed. Model pruning is expected to reduce the calculation time by thinning the original neural network structure, but is limited by the efficiency of sparse matrix operation, and the model operation time cannot be reduced by the scheme at present. Model distillation is realized by transferring the knowledge of the trained teacher model (large model) to the student model (small model), so as to achieve the purpose of reducing the calculation amount and the calculation time of the neural network model. However, post-distillation model accuracy and computation speed are related to the model distillation algorithm, and different teacher model distillations may correspond to different model distillation algorithms. Therefore, the compression method of the model distillation cannot be widely popularized.
Therefore, the existing neural network model reasoning acceleration method still has great limitation and cannot meet the real-time detection operation requirement of the monitoring video flow scene, so that the invention provides the neural network model reasoning acceleration method facing the monitoring video flow scene.
Disclosure of Invention
In order to overcome the defects, the invention provides a neural network model reasoning acceleration method oriented to a monitoring video stream scene, which starts from the feedforward characteristic and the hierarchical cascade characteristic of a neural network and improves the running speed of neural network reasoning calculation by using a multi-process technology.
The invention provides the following technical scheme: the invention relates to a neural network model reasoning acceleration method for a monitoring video stream scene, which specifically comprises the following steps:
the method comprises the following steps that firstly, a neural network hierarchy is taken as a unit, segmentation is carried out according to the neural network reasoning and calculating direction, an original neural network model is segmented into a plurality of modules, the output of the former module is used as the input of the latter module, and the segmented modules are processed in parallel by utilizing a multi-process technology, namely, one module corresponds to one process;
reading a video stream of a monitoring camera, and decoding the video stream into a frame image according to a frame rate to serve as model input data;
and thirdly, input data are independently calculated among all modules of the neural network in a multi-process mode in parallel, data transmission among module processes is carried out in a message Queue mode, the former module (Modulepre) transmits an output result to a message Queue (Queue), and the latter module (Modulenext) takes the output result from the message Queue (Queue) as input data of the module, so that feedforward full-process calculation of the neural network model is realized.
Further, when the frame image data is single data, the multi-process neural network model is input according to the first step and the second step, the overall operation time from input to output of the neural network is the sum of the running time of each module (Tmemory-sum) and the time (Tqueue-sum) required by the queue data to enter and exit between processes, namely Tmemory-sum + Tqueue-sum, and the total time is greater than the original operation time (Told) of the overall neural network.
Further, when the frame image data is a plurality of data, the frame image data is sequentially input into the multi-process neural network model according to the first step and the second step, although the time for finishing the operation of each data is still greater than the operation time when the whole neural network is not divided, the minimum time interval of the operation between the data is reduced from the original (Told) to the maximum value of the sum of the operation time (Tmemory) of a single divided module and the time (Tqueue) required by the enqueuing and dequeuing operation of the queue corresponding to the process, namely Tmemory + Tqueue | max; when the Tmodel + Tqueue is smaller than the Told, the operation time interval between the data is reduced; the reduction of the inter-data operation time interval means that the frame rate of the monitoring video stream can be improved, and the real-time performance of the monitoring video stream data in the neural network processing is improved.
Further, if there is inconsistency in the operation time of each process, the dequeuing and enqueuing frequencies of the message queue will also be inconsistent, and this inconsistency will cause that an image of a certain frame in the video stream is covered by an image of the next frame without being processed by the neural network in the queue; for a monitoring video stream scene, video frame images are obtained by quantization from analog quantity, each frame of image is not required to be processed under the actual condition, and the video frame rate is set along with the operation speed of a neural network; in addition, the video frame missing situation is essentially the same as the frame missing situation caused by the slow inference speed of the neural network.
The invention with the structure has the following beneficial effects: the invention relates to a neural network model reasoning acceleration method for a monitoring video stream scene, which has the following beneficial effects:
(1) compared with a model compression method, the method can accelerate the model reasoning speed on the premise of not reducing the model precision;
(2) combining a multi-process technology with a neural network cascade characteristic and a feedforward characteristic to essentially accelerate the neural network model reasoning speed;
(3) compared with the mode that a plurality of neural network models are started simultaneously, the technical scheme of the invention occupies smaller hardware resources.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of an embodiment of a neural network model inference acceleration method for a surveillance video stream scene according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the technical solution of the present invention clearer, the following takes a neural network vgg16 model as an example, and the following describes in detail each implementation step of the present invention with reference to the accompanying drawings.
Step 1: dividing the neural network model vgg16 by taking a hierarchy as a unit, and dividing and storing every four layers into independent modules which are named as a neural network module 1(module 1), a neural network module 2(module 2), a neural network module 3(module 3) and a neural network module 4(module 4);
step 2: a message queue corresponds to each module, three message queues are created among 4 neural network modules, namely a message queue 1(queue 1), a message queue 2(queue2) and a message queue 3(queue3), the queues are set to be non-blocking types, and the buffer sizes of the queues are set;
and step 3: starting a process 1, reading a video stream of a monitoring camera, decoding a frame image according to a frame rate to be used as model input data, and sending the data to a message queue (queue 1);
and 4, step 4: starting a process 2, acquiring data from a message queue 1(queue 1) by a neural network module 1(module 1) for operation, sending an operation result to the message queue 2(queue2), and reading next frame data from an input source to input the next frame data into the neural network module 1(module 1) after the operation result is sent;
and 5: starting the process 3, acquiring data from the message queue 2(queue2) by the neural network module 2(module 2) for operation, sending an operation result to the message queue 3(queue3), and reading next frame data from the message queue 2(queue2) to input the next frame data into the neural network module 2(module 2) after the operation result is sent;
step 6: starting the process 4, acquiring data from the message queue 3(queue3) by the neural network module 3(module 3) for operation, sending an operation result to the message queue 4(queue 4), and reading the next data from the message queue 3(queue3) after the operation result is sent to input the next data into the neural network module 3(module 3);
and 7: and starting the process 5, acquiring data from the message queue (queue 4) by the neural network module 4(module 4), operating and outputting an operation result, wherein the operation result is a final operation result of the neural network model, and reading the next data from the message queue (queue 4) after the operation result is sent and inputting the next data into the neural network module 4(module 4).
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (4)
1. A neural network model reasoning acceleration method for a monitoring video stream scene is characterized by comprising the following steps:
the method comprises the following steps that firstly, a neural network hierarchy is taken as a unit, segmentation is carried out according to the neural network reasoning and calculating direction, an original neural network model is segmented into a plurality of modules, the output of the former module is used as the input of the latter module, and the segmented modules are processed in parallel by utilizing a multi-process technology, namely, one module corresponds to one process;
reading a video stream of a monitoring camera, and decoding the video stream into a frame image according to a frame rate to serve as model input data;
and thirdly, independently calculating input data among all modules of the neural network in a multi-process mode in parallel, wherein data transmission among module processes is carried out in a message queue mode, the former module transmits an output result into a message queue, and the latter module fetches data from the message queue to serve as the input data of the former module, so that feedforward full-process calculation of the neural network model is realized in the mode.
2. The neural network model reasoning and accelerating method oriented to the scene of the surveillance video stream as recited in claim 1, wherein when the frame image data is single data, the frame image is input into the multi-process neural network model according to the first step, the second step and the third step, the total operation time from input to output of the neural network is the sum of the operation time of each module and the time required for queue data to enter and exit between processes, and the sum of the required time is greater than the original operation time of the whole neural network.
3. The neural network model reasoning and accelerating method oriented to the monitored video stream scene as claimed in claim 1, wherein when the frame image data is multiple data, the frame image is input to the multi-process neural network model according to the first step, the second step and the third step, and the minimum time interval of the operation between the data is reduced to the maximum value of the sum of the operation time of the single divided module and the time required for the enqueue and dequeue operations of the queue corresponding to the process.
4. The method according to claim 1, wherein if there is inconsistency in the computation time of each process, the dequeuing and enqueuing frequencies of the message queue will also be inconsistent, and the inconsistency will cause an image of a frame in the video stream to be overwritten by an image of a next frame without being processed by the neural network in the queue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110911956.7A CN113610209A (en) | 2021-08-10 | 2021-08-10 | Neural network model reasoning acceleration method for monitoring video stream scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110911956.7A CN113610209A (en) | 2021-08-10 | 2021-08-10 | Neural network model reasoning acceleration method for monitoring video stream scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113610209A true CN113610209A (en) | 2021-11-05 |
Family
ID=78340094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110911956.7A Pending CN113610209A (en) | 2021-08-10 | 2021-08-10 | Neural network model reasoning acceleration method for monitoring video stream scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113610209A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019134987A1 (en) * | 2018-01-05 | 2019-07-11 | Deepmind Technologies Limited | Parallel video processing systems |
CN110796242A (en) * | 2019-11-01 | 2020-02-14 | 广东三维家信息科技有限公司 | Neural network model reasoning method and device, electronic equipment and readable medium |
CN111475313A (en) * | 2020-03-04 | 2020-07-31 | 江苏理工学院 | Message queue construction method and device suitable for convolutional neural network forward propagation |
CN112434804A (en) * | 2020-10-23 | 2021-03-02 | 东南数字经济发展研究院 | Compression algorithm for deep transform cascade neural network model |
US20210073170A1 (en) * | 2019-09-09 | 2021-03-11 | Shanghai Denglin Technologies Co., Ltd. | Configurable heterogeneous ai processor |
CN113128688A (en) * | 2021-04-14 | 2021-07-16 | 北京航空航天大学 | General AI parallel reasoning acceleration structure and reasoning equipment |
-
2021
- 2021-08-10 CN CN202110911956.7A patent/CN113610209A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019134987A1 (en) * | 2018-01-05 | 2019-07-11 | Deepmind Technologies Limited | Parallel video processing systems |
US20210073170A1 (en) * | 2019-09-09 | 2021-03-11 | Shanghai Denglin Technologies Co., Ltd. | Configurable heterogeneous ai processor |
CN110796242A (en) * | 2019-11-01 | 2020-02-14 | 广东三维家信息科技有限公司 | Neural network model reasoning method and device, electronic equipment and readable medium |
CN111475313A (en) * | 2020-03-04 | 2020-07-31 | 江苏理工学院 | Message queue construction method and device suitable for convolutional neural network forward propagation |
CN112434804A (en) * | 2020-10-23 | 2021-03-02 | 东南数字经济发展研究院 | Compression algorithm for deep transform cascade neural network model |
CN113128688A (en) * | 2021-04-14 | 2021-07-16 | 北京航空航天大学 | General AI parallel reasoning acceleration structure and reasoning equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108960340B (en) | Convolutional neural network compression method and face detection method | |
CN109409513B (en) | Task processing method based on neural network and related equipment | |
US20220083857A1 (en) | Convolutional neural network operation method and device | |
CN111694643B (en) | Task scheduling execution system and method for graph neural network application | |
US20220207356A1 (en) | Neural network processing unit with network processor and convolution processor | |
CN113449839A (en) | Distributed training method, gradient communication device and computing equipment | |
WO2022152104A1 (en) | Action recognition model training method and device, and action recognition method and device | |
CN113012023A (en) | Video analysis acceleration method and system based on many-core processor | |
US20220207327A1 (en) | Method for dividing processing capabilities of artificial intelligence between devices and servers in network environment | |
CN111401532A (en) | Convolutional neural network reasoning accelerator and acceleration method | |
CN115345285B (en) | GPU-based timing chart neural network training method and system and electronic equipment | |
CN117574970A (en) | Inference acceleration method, system, terminal and medium for large-scale language model | |
CN117223005A (en) | Accelerator, computer system and method | |
WO2024160219A1 (en) | Model quantization method and apparatus | |
CN115130649A (en) | Deep learning model partitioning method and device for pipeline distributed end cloud collaborative reasoning | |
Cheng et al. | Edge-assisted lightweight region-of-interest extraction and transmission for vehicle perception | |
Pei et al. | A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera | |
CN113610209A (en) | Neural network model reasoning acceleration method for monitoring video stream scene | |
CN112200310A (en) | Intelligent processor, data processing method and storage medium | |
CN112487911A (en) | Real-time pedestrian detection method and device based on improved yolov3 in intelligent monitoring environment | |
CN111626298A (en) | Real-time image semantic segmentation device and segmentation method | |
Lu et al. | Dynamic offloading on a hybrid edge–cloud architecture for multiobject tracking | |
CN1436425A (en) | Adaptive early exit techniques for image correlation minimum distortion calculation | |
CN112738225B (en) | Edge calculation method based on artificial intelligence | |
CN114639166A (en) | Examination room abnormal behavior recognition method based on motion recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |