CN111277835A

CN111277835A - Monitoring video compression and decompression method combining yolo3 and flownet2 network

Info

Publication number: CN111277835A
Application number: CN202010098304.1A
Authority: CN
Inventors: 汝佩哲; 李锐; 金长新
Original assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Current assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2020-06-12

Abstract

The invention provides a monitoring video compression and decompression method combining yolo3 and a flownet2 network, which comprises the following compression realization steps: preparing a monitoring video file; segmenting a video file, and extracting all frame image data in the video file; carrying out target detection on the frame image data by using a yolo3 network, wherein a detection target is set according to actual requirements; compressing the image data of the first frame of the video segment with the detected target and the video segment without the detected target; extracting optical flow information between adjacent frames of the video segment of the detected target by using a flomonet 2 network; reconstructing a frame by using a network and subtracting the original frame to obtain residual data; and carrying out rounding quantization and arithmetic coding entropy coding on the optical flow information and the residual error data. The method deletes a large amount of redundant information on time and space in the monitoring video, has higher compression ratio on the monitoring video than the traditional video compression method (such as H.263 and H.26), saves the storage space and improves the storage benefit.

Description

Monitoring video compression and decompression method combining yolo3 and flownet2 network

Technical Field

The invention relates to the field of deep learning, in particular to a monitoring video compression and decompression method combining yolo3 and a flownet2 network.

Background

Video compression has been one of the research hotspots in academia and industry. With the rapid development of information technology, multimedia information gradually becomes the most important carrier for human to acquire external information. Video compression is a process of changing the format of video content through video coding, with the goal of reducing the storage space occupied by video. In today's video storage, a large portion comes from surveillance video. Particularly, with the further advance of digital life, a large amount of high-definition monitoring videos are generated in daily life.

Compared with the common video, the monitoring video has the following characteristics: firstly, video content stabilization: for the nature of the surveillance video, the content scene of the surveillance video is usually stable, and there is little drastic change, and the number of video frames is usually low. Especially at night, the monitoring video content is not changed; secondly, video background fixing: because the monitoring camera is often fixed, the background of the monitoring video is basically not changed for a long period of time; thirdly, video target is definite: the monitoring video aim generally has higher pertinence, most of the monitoring video aim focuses on monitoring pedestrians and vehicles, and the video aim is definite. Therefore, surveillance video has a higher compression potential than normal video.

Conventional video compression methods (h.263, H264, etc.) compress video based on intra-frame prediction, inter-frame prediction, quantization, encoding, etc., have achieved excellent results for general video compression, and are widely applied in practice. However, the conventional video compression method needs to store data of each frame, and this method cannot compress the surveillance video well, and there still exists a lot of time and content redundant information in the surveillance video.

In recent years, with the development of deep learning technology, especially the success of convolutional neural network in the image processing and computer fields, it becomes possible to efficiently compress video by using the deep learning technology. In view of this, it is very important to develop a method for compressing video based on deep learning technology.

Disclosure of Invention

The technical task of the invention is to provide a monitoring video compression and decompression method combining yolo3 and a flownet2 network aiming at the defects of the prior art, the method carries out target detection on input monitoring video frame data through a yolo3 network, extracts optical flow data from the detected target frame data through the flownet2, compresses and stores the optical flow data, and deletes a large amount of time and space redundant information in the monitoring video. Compared with the traditional video compression methods such as H.263 and H.264 and the like, the method has higher compression ratio on the monitoring video, saves the storage space and improves the storage benefit.

The technical scheme adopted by the invention for solving the technical problems is as follows:

1. the invention provides a monitoring video compression and decompression method combining yolo3 and a flownet2 network, which comprises the following implementation steps:

s1, preparing a monitoring video file;

s2, segmenting the video file into image files of one frame and one frame, and extracting all frame image data in the video file;

s3, carrying out target detection on the frame image data by using a yolo3 network, wherein the detection target is set according to actual requirements;

s4, compressing the image data of the first frame of the video segment with the detected target and the video segment without the detected target;

s5, extracting optical flow information between adjacent frames of the video segment of the detected target by using a flomonet 2 network;

s6, reconstructing a frame by using a network and subtracting the original frame to obtain residual error data;

and S7, rounding quantization and arithmetic coding entropy coding are carried out on the optical flow information and the residual error data.

Preferably, in step S2, all frame image data in the video file are extracted by the ffmpeg tool.

Preferably, in step S4, the image data of the first frame of the video segment with the detected object and the undetected object is compressed by JPEG or JPEG 2000.

Preferably, in step S4, for the continuous frame image data without detected target, the first frame is compressed and maintained by using the traditional image compression method, and the rest frame images are not saved;

compressing and maintaining the continuous frame image data of the detected target by using a traditional image compression mode for the first frame;

preferably, in step S5, for the consecutive frame image data of the detected object, the optical flow information between the adjacent frames of the video segment of the detected object is extracted by using the flownet2 network for the remaining frame images.

Preferably, in step S6, the optical flow information and the image data of the frame immediately before the optical flow information are reconstructed by using the motion compensation network.

2. The invention also provides a method for decompressing the monitoring video, which comprises the following steps:

A. decompressing the first frame image data;

B. copying first frame data to the video segment of which the target is not detected;

C. reconstructing the frame data of the video segment with the detected target by utilizing the optical flow information and the residual data through a Motion Compensation network;

D. and synthesizing all the reconstructed frame images into a complete video.

Preferably, in step a, the first frame image data is decompressed by JPEG or JPEG 2000.

Preferably, in step C, the optical flow information and the residual data are used for reconstructing the frame data of the video segment with the detected object through the Motion Compensation network.

Preferably, in step D, all the reconstructed frame images are combined into a complete video through the ffmpeg tool.

Compared with the prior art, the monitoring video compression and decompression method combining the yolo3 and the flownet2 network has the following beneficial effects:

according to the invention, target detection is carried out on input monitoring video frame data through the yolo3 network, optical flow data is extracted from the detected target frame data through the flownet2 and is compressed and stored, and a large amount of time and space redundant information in the monitoring video is deleted. Compared with the traditional video compression methods such as H.263 and H.264, the method has higher compression ratio on the monitoring video.

Drawings

To more clearly describe the working principle of the automatic spraying combined dust-catching net according to the present invention, a simplified diagram will be attached for further explanation.

FIG. 1 is a flow chart of a surveillance video compression method combining yolo3 and flownet2 networks according to the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

Referring to fig. 1, the surveillance video compression method of the present invention combines yolo3 and flownet2 networks, which performs object detection on input surveillance video frame data through yolo3 network, extracts optical flow data through flownet2 and compresses and stores the optical flow data when detecting the object frame data,

the method comprises the following implementation steps:

s1, preparing a monitoring video file;

In the above step S2, all the frame image data in the video file is extracted by the ffmpeg tool.

In the above step S4, the image data of the first frame of the video segment with and without the detected object is compressed by using the conventional image compression method (JPEG or JPEG 2000).

In the step S4, for the continuous frame image data without the detected object, the first frame is compressed and maintained by using the conventional image compression method (JPEG or JPEG2000), and the rest frame images are not saved;

in the above step S5, for the frame image data of the continuous detected object, the optical flow information between the adjacent frames of the video segment in which the object is detected is extracted for the remaining frame images by using the flownet2 network.

In step S6, the optical flow information and the image data of the previous frame are reconstructed into the frame by the Motion Compensation network.

The specific implementation steps are as follows:

firstly, the method comprises the following steps: extracting image data of each frame in the input surveillance video, expressed as

I₁，I₂，I₃...。

II, secondly: the yolo3 network is used for carrying out target detection on the frame image data, and the detection targets are set according to actual requirements, such as pedestrians and vehicles.

Thirdly, the method comprises the following steps: for successive undetected frame image data, e.g. I₁～I₁₀For the first frame (I)₁) The image is compressed and stored by using a traditional image compression mode (such as JPEG or JPEG 2000). The remaining frame images are not saved.

Fourthly, the method comprises the following steps: for continuously detected objects (pedestrians and vehicles)) Frame image data, e.g. I₁₁～I₁₈For the first frame (I)₁₁) The image is compressed and stored by using a traditional image compression mode (such as JPEG or JPEG 2000).

Fifthly: for the rest of the frame (I)₁₂～I₁₈) Through a flownet2 network (symbolized)

Representing) obtaining optical flow information f for each frame and its previous frame₁₂～f₁₈. Namely, it is

Sixthly, the method comprises the following steps: converting the optical flow information f₁₂～f₁₈And the previous frame image data I₁₁～I₁₇Through the Motion Compensation network (symbolized)

Representation) to reconstruct the frame.

Namely, it is

Seventhly, the method comprises the following steps: reconstructing frame I'₁₂And the original frame I₁₂Subtracting to obtain residual data r

Namely r₁₂＝I₁₂-I′₁₂

Eighthly: and quantizing and entropy coding the optical flow information f and the residual data r and storing the optical flow information f and the residual data r.

Example two

The invention provides a method for decompressing a monitoring video, which comprises the following steps:

A. decompressing the first frame image data;

D. and synthesizing all the reconstructed frame images into a complete video.

In the step a, the image data of the first frame is decompressed by using a conventional image decompression method (JPEG or JPEG 2000).

In the step C, the optical flow information and the residual data are used to reconstruct the frame data of the video segment in which the target is detected through the motion compensation network.

In the step D, all the reconstructed frame images are combined into a complete video through the ffmpeg tool.

The specific implementation steps are as follows:

firstly, the method comprises the following steps: and decompressing the stored first frame data by a traditional image compression mode (such as JPEG or JPEG2000) to reconstruct the picture.

II, secondly: and copying the data of the first frame to the continuous frame section of which the target is not detected.

Thirdly, the method comprises the following steps: and reconstructing the frame data of the frame section with the detected target through the optical flow information f and the residual data r.

Namely, it is

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

In addition to the technical features described in the specification, the technology is known to those skilled in the art.

Claims

1. A surveillance video compression method combining yolo3 and a flownet2 network is characterized in that the method carries out target detection on input surveillance video frame data through a yolo3 network, extracts optical flow data through a flownet2 and compresses and stores the optical flow data when the target frame data are detected,

the method comprises the following implementation steps:

s1, preparing a monitoring video file;

2. The surveillance video compression method combining yolo3 and flownet2 network as claimed in claim 1, wherein in step S2, all frame image data in the video file are extracted by ffmpeg tool.

3. The surveillance video compression method combining yolo3 and flownet2 network as claimed in claim 1 or 2, wherein in step S4, the image data of the first frame of the video segment with and without the detected object is compressed by JPEG or JPEG 2000.

4. The surveillance video compression method combining yolo3 and flownet2 network as claimed in claim 1 or 2, wherein in step S4, for consecutive frame image data of undetected objects, the first frame is compressed and retained by using a conventional image compression method, and the rest frame images are not retained;

and compressing and maintaining the continuous frame image data of the detected target by using a traditional image compression mode for the first frame.

5. The surveillance video compression method combining yolo3 and flomonet 2 network as claimed in claim 4, wherein in step S5, for the consecutive frame image data of the detected object, the flomonet 2 network is used to extract the optical flow information between the adjacent frames of the video segment of the detected object for the rest of the frame images.

6. The surveillance video compression method combining yolo3 and flownet2 network as claimed in claim 1, 2 or 5, wherein in step S6, the optical flow information and the image data of the previous frame are reconstructed by using Motion Compensation network.

7. A method for decompressing a surveillance video is characterized in that the method is realized by the following steps:

A. decompressing the first frame image data;

D. and synthesizing all the reconstructed frame images into a complete video.

8. The method for decompressing monitor video according to claim 7, wherein in step a, the first frame image data is decompressed by JPEG or JPEG 2000.

9. The method according to claim 7 or 8, wherein in step C, the optical flow information and residual data are used to reconstruct the frame data of the target-detected video segment through a Motion Compensation network.

10. The method for decompressing monitor video according to claim 7 or 8, wherein in step D, all the reconstructed frame images are combined into a complete video through the ffmpeg tool.