CN113610209A - Neural network model reasoning acceleration method for monitoring video stream scene - Google Patents

Neural network model reasoning acceleration method for monitoring video stream scene Download PDF

Info

Publication number
CN113610209A
CN113610209A CN202110911956.7A CN202110911956A CN113610209A CN 113610209 A CN113610209 A CN 113610209A CN 202110911956 A CN202110911956 A CN 202110911956A CN 113610209 A CN113610209 A CN 113610209A
Authority
CN
China
Prior art keywords
neural network
module
data
video stream
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110911956.7A
Other languages
Chinese (zh)
Inventor
陈轶
张文
牛少彰
王茂森
崔浩亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast Digital Economic Development Research Institute
Original Assignee
Southeast Digital Economic Development Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast Digital Economic Development Research Institute filed Critical Southeast Digital Economic Development Research Institute
Priority to CN202110911956.7A priority Critical patent/CN113610209A/en
Publication of CN113610209A publication Critical patent/CN113610209A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

The invention discloses a neural network model reasoning acceleration method facing a monitoring video stream scene, which is characterized in that a neural network hierarchy is taken as a unit, segmentation is carried out according to the neural network reasoning calculation direction, an original neural network model is segmented into a plurality of modules, the output of the former module is taken as the input of the latter module, and the segmented modules are processed in parallel by utilizing a multi-process technology, namely, one module corresponds to one process; input data are independently calculated among all modules of the neural network in a multi-process mode in parallel, and data transmission among module processes is carried out in a message queue mode. The invention relates to the technical field of neural network models, and particularly provides a neural network model reasoning acceleration method for a monitoring video stream scene.

Description

Neural network model reasoning acceleration method for monitoring video stream scene
Technical Field
The invention relates to the technical field of neural network models, in particular to a neural network model reasoning acceleration method for a monitoring video stream scene.
Background
With the continuous development and evolution of the neural network, the neural network achieves excellent performance in the field of computer vision by virtue of strong fitting learning capability and exceeds the human level in multiple tasks. Meanwhile, the application of the network camera is popularized due to the arrival of the internet of things era. The network camera provides data information externally in a video stream mode, and the data density is high. Therefore, the real-time performance of the neural network model for processing the video stream is high. The accuracy of the neural network model is improved mainly by optimizing an internal structure, increasing the number of network layers and the like. However, neural network inference latency is also greatly increased as the network hierarchy deepens, which is unacceptable for time-sensitive projects.
Aiming at the problem, the existing work is mainly optimized from two aspects of improving the operation efficiency of neural network operators and model compression. In the aspect of operator operation efficiency, Intel and some scientific research institutions respectively provide own computation libraries MKL and OpenBLAS for linear algebra computation widely involved in neural network computation, and the neural network computation efficiency is improved by reducing the computation complexity of advanced linear algebra computation such as matrix multiplication, inversion and singular value computation.
In terms of model compression, the model compression methods are roughly classified into three methods, i.e., model quantization, pruning, and distillation. The model quantization is mainly realized by compressing floating point 32-bit variables commonly used in neural network calculation to a floating point 16-bit or integer type according to a certain rule, but the floating point 32-bit matrix operation optimization scheme is mature, and the model quantization can only compress the model volume and cannot effectively increase the model operation speed. Model pruning is expected to reduce the calculation time by thinning the original neural network structure, but is limited by the efficiency of sparse matrix operation, and the model operation time cannot be reduced by the scheme at present. Model distillation is realized by transferring the knowledge of the trained teacher model (large model) to the student model (small model), so as to achieve the purpose of reducing the calculation amount and the calculation time of the neural network model. However, post-distillation model accuracy and computation speed are related to the model distillation algorithm, and different teacher model distillations may correspond to different model distillation algorithms. Therefore, the compression method of the model distillation cannot be widely popularized.
Therefore, the existing neural network model reasoning acceleration method still has great limitation and cannot meet the real-time detection operation requirement of the monitoring video flow scene, so that the invention provides the neural network model reasoning acceleration method facing the monitoring video flow scene.
Disclosure of Invention
In order to overcome the defects, the invention provides a neural network model reasoning acceleration method oriented to a monitoring video stream scene, which starts from the feedforward characteristic and the hierarchical cascade characteristic of a neural network and improves the running speed of neural network reasoning calculation by using a multi-process technology.
The invention provides the following technical scheme: the invention relates to a neural network model reasoning acceleration method for a monitoring video stream scene, which specifically comprises the following steps:
the method comprises the following steps that firstly, a neural network hierarchy is taken as a unit, segmentation is carried out according to the neural network reasoning and calculating direction, an original neural network model is segmented into a plurality of modules, the output of the former module is used as the input of the latter module, and the segmented modules are processed in parallel by utilizing a multi-process technology, namely, one module corresponds to one process;
reading a video stream of a monitoring camera, and decoding the video stream into a frame image according to a frame rate to serve as model input data;
and thirdly, input data are independently calculated among all modules of the neural network in a multi-process mode in parallel, data transmission among module processes is carried out in a message Queue mode, the former module (Modulepre) transmits an output result to a message Queue (Queue), and the latter module (Modulenext) takes the output result from the message Queue (Queue) as input data of the module, so that feedforward full-process calculation of the neural network model is realized.
Further, when the frame image data is single data, the multi-process neural network model is input according to the first step and the second step, the overall operation time from input to output of the neural network is the sum of the running time of each module (Tmemory-sum) and the time (Tqueue-sum) required by the queue data to enter and exit between processes, namely Tmemory-sum + Tqueue-sum, and the total time is greater than the original operation time (Told) of the overall neural network.
Further, when the frame image data is a plurality of data, the frame image data is sequentially input into the multi-process neural network model according to the first step and the second step, although the time for finishing the operation of each data is still greater than the operation time when the whole neural network is not divided, the minimum time interval of the operation between the data is reduced from the original (Told) to the maximum value of the sum of the operation time (Tmemory) of a single divided module and the time (Tqueue) required by the enqueuing and dequeuing operation of the queue corresponding to the process, namely Tmemory + Tqueue | max; when the Tmodel + Tqueue is smaller than the Told, the operation time interval between the data is reduced; the reduction of the inter-data operation time interval means that the frame rate of the monitoring video stream can be improved, and the real-time performance of the monitoring video stream data in the neural network processing is improved.
Further, if there is inconsistency in the operation time of each process, the dequeuing and enqueuing frequencies of the message queue will also be inconsistent, and this inconsistency will cause that an image of a certain frame in the video stream is covered by an image of the next frame without being processed by the neural network in the queue; for a monitoring video stream scene, video frame images are obtained by quantization from analog quantity, each frame of image is not required to be processed under the actual condition, and the video frame rate is set along with the operation speed of a neural network; in addition, the video frame missing situation is essentially the same as the frame missing situation caused by the slow inference speed of the neural network.
The invention with the structure has the following beneficial effects: the invention relates to a neural network model reasoning acceleration method for a monitoring video stream scene, which has the following beneficial effects:
(1) compared with a model compression method, the method can accelerate the model reasoning speed on the premise of not reducing the model precision;
(2) combining a multi-process technology with a neural network cascade characteristic and a feedforward characteristic to essentially accelerate the neural network model reasoning speed;
(3) compared with the mode that a plurality of neural network models are started simultaneously, the technical scheme of the invention occupies smaller hardware resources.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of an embodiment of a neural network model inference acceleration method for a surveillance video stream scene according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the technical solution of the present invention clearer, the following takes a neural network vgg16 model as an example, and the following describes in detail each implementation step of the present invention with reference to the accompanying drawings.
Step 1: dividing the neural network model vgg16 by taking a hierarchy as a unit, and dividing and storing every four layers into independent modules which are named as a neural network module 1(module 1), a neural network module 2(module 2), a neural network module 3(module 3) and a neural network module 4(module 4);
step 2: a message queue corresponds to each module, three message queues are created among 4 neural network modules, namely a message queue 1(queue 1), a message queue 2(queue2) and a message queue 3(queue3), the queues are set to be non-blocking types, and the buffer sizes of the queues are set;
and step 3: starting a process 1, reading a video stream of a monitoring camera, decoding a frame image according to a frame rate to be used as model input data, and sending the data to a message queue (queue 1);
and 4, step 4: starting a process 2, acquiring data from a message queue 1(queue 1) by a neural network module 1(module 1) for operation, sending an operation result to the message queue 2(queue2), and reading next frame data from an input source to input the next frame data into the neural network module 1(module 1) after the operation result is sent;
and 5: starting the process 3, acquiring data from the message queue 2(queue2) by the neural network module 2(module 2) for operation, sending an operation result to the message queue 3(queue3), and reading next frame data from the message queue 2(queue2) to input the next frame data into the neural network module 2(module 2) after the operation result is sent;
step 6: starting the process 4, acquiring data from the message queue 3(queue3) by the neural network module 3(module 3) for operation, sending an operation result to the message queue 4(queue 4), and reading the next data from the message queue 3(queue3) after the operation result is sent to input the next data into the neural network module 3(module 3);
and 7: and starting the process 5, acquiring data from the message queue (queue 4) by the neural network module 4(module 4), operating and outputting an operation result, wherein the operation result is a final operation result of the neural network model, and reading the next data from the message queue (queue 4) after the operation result is sent and inputting the next data into the neural network module 4(module 4).
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (4)

1. A neural network model reasoning acceleration method for a monitoring video stream scene is characterized by comprising the following steps:
the method comprises the following steps that firstly, a neural network hierarchy is taken as a unit, segmentation is carried out according to the neural network reasoning and calculating direction, an original neural network model is segmented into a plurality of modules, the output of the former module is used as the input of the latter module, and the segmented modules are processed in parallel by utilizing a multi-process technology, namely, one module corresponds to one process;
reading a video stream of a monitoring camera, and decoding the video stream into a frame image according to a frame rate to serve as model input data;
and thirdly, independently calculating input data among all modules of the neural network in a multi-process mode in parallel, wherein data transmission among module processes is carried out in a message queue mode, the former module transmits an output result into a message queue, and the latter module fetches data from the message queue to serve as the input data of the former module, so that feedforward full-process calculation of the neural network model is realized in the mode.
2. The neural network model reasoning and accelerating method oriented to the scene of the surveillance video stream as recited in claim 1, wherein when the frame image data is single data, the frame image is input into the multi-process neural network model according to the first step, the second step and the third step, the total operation time from input to output of the neural network is the sum of the operation time of each module and the time required for queue data to enter and exit between processes, and the sum of the required time is greater than the original operation time of the whole neural network.
3. The neural network model reasoning and accelerating method oriented to the monitored video stream scene as claimed in claim 1, wherein when the frame image data is multiple data, the frame image is input to the multi-process neural network model according to the first step, the second step and the third step, and the minimum time interval of the operation between the data is reduced to the maximum value of the sum of the operation time of the single divided module and the time required for the enqueue and dequeue operations of the queue corresponding to the process.
4. The method according to claim 1, wherein if there is inconsistency in the computation time of each process, the dequeuing and enqueuing frequencies of the message queue will also be inconsistent, and the inconsistency will cause an image of a frame in the video stream to be overwritten by an image of a next frame without being processed by the neural network in the queue.
CN202110911956.7A 2021-08-10 2021-08-10 Neural network model reasoning acceleration method for monitoring video stream scene Pending CN113610209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110911956.7A CN113610209A (en) 2021-08-10 2021-08-10 Neural network model reasoning acceleration method for monitoring video stream scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110911956.7A CN113610209A (en) 2021-08-10 2021-08-10 Neural network model reasoning acceleration method for monitoring video stream scene

Publications (1)

Publication Number Publication Date
CN113610209A true CN113610209A (en) 2021-11-05

Family

ID=78340094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110911956.7A Pending CN113610209A (en) 2021-08-10 2021-08-10 Neural network model reasoning acceleration method for monitoring video stream scene

Country Status (1)

Country Link
CN (1) CN113610209A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019134987A1 (en) * 2018-01-05 2019-07-11 Deepmind Technologies Limited Parallel video processing systems
CN110796242A (en) * 2019-11-01 2020-02-14 广东三维家信息科技有限公司 Neural network model reasoning method and device, electronic equipment and readable medium
CN111475313A (en) * 2020-03-04 2020-07-31 江苏理工学院 Message queue construction method and device suitable for convolutional neural network forward propagation
CN112434804A (en) * 2020-10-23 2021-03-02 东南数字经济发展研究院 Compression algorithm for deep transform cascade neural network model
US20210073170A1 (en) * 2019-09-09 2021-03-11 Shanghai Denglin Technologies Co., Ltd. Configurable heterogeneous ai processor
CN113128688A (en) * 2021-04-14 2021-07-16 北京航空航天大学 General AI parallel reasoning acceleration structure and reasoning equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019134987A1 (en) * 2018-01-05 2019-07-11 Deepmind Technologies Limited Parallel video processing systems
US20210073170A1 (en) * 2019-09-09 2021-03-11 Shanghai Denglin Technologies Co., Ltd. Configurable heterogeneous ai processor
CN110796242A (en) * 2019-11-01 2020-02-14 广东三维家信息科技有限公司 Neural network model reasoning method and device, electronic equipment and readable medium
CN111475313A (en) * 2020-03-04 2020-07-31 江苏理工学院 Message queue construction method and device suitable for convolutional neural network forward propagation
CN112434804A (en) * 2020-10-23 2021-03-02 东南数字经济发展研究院 Compression algorithm for deep transform cascade neural network model
CN113128688A (en) * 2021-04-14 2021-07-16 北京航空航天大学 General AI parallel reasoning acceleration structure and reasoning equipment

Similar Documents

Publication Publication Date Title
CN108960340B (en) Convolutional neural network compression method and face detection method
CN109409513B (en) Task processing method based on neural network and related equipment
US20220083857A1 (en) Convolutional neural network operation method and device
CN111694643B (en) Task scheduling execution system and method for graph neural network application
US20220207356A1 (en) Neural network processing unit with network processor and convolution processor
CN113449839A (en) Distributed training method, gradient communication device and computing equipment
WO2022152104A1 (en) Action recognition model training method and device, and action recognition method and device
CN113012023A (en) Video analysis acceleration method and system based on many-core processor
US20220207327A1 (en) Method for dividing processing capabilities of artificial intelligence between devices and servers in network environment
CN111401532A (en) Convolutional neural network reasoning accelerator and acceleration method
CN115345285B (en) GPU-based timing chart neural network training method and system and electronic equipment
CN117574970A (en) Inference acceleration method, system, terminal and medium for large-scale language model
CN117223005A (en) Accelerator, computer system and method
WO2024160219A1 (en) Model quantization method and apparatus
CN115130649A (en) Deep learning model partitioning method and device for pipeline distributed end cloud collaborative reasoning
Cheng et al. Edge-assisted lightweight region-of-interest extraction and transmission for vehicle perception
Pei et al. A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera
CN113610209A (en) Neural network model reasoning acceleration method for monitoring video stream scene
CN112200310A (en) Intelligent processor, data processing method and storage medium
CN112487911A (en) Real-time pedestrian detection method and device based on improved yolov3 in intelligent monitoring environment
CN111626298A (en) Real-time image semantic segmentation device and segmentation method
Lu et al. Dynamic offloading on a hybrid edge–cloud architecture for multiobject tracking
CN1436425A (en) Adaptive early exit techniques for image correlation minimum distortion calculation
CN112738225B (en) Edge calculation method based on artificial intelligence
CN114639166A (en) Examination room abnormal behavior recognition method based on motion recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination