CN110602487A

CN110602487A - Video image jitter detection method based on TSN (time delay network)

Info

Publication number: CN110602487A
Application number: CN201910843031.6A
Authority: CN
Inventors: 毛亮; 王倩; 李俊民; 朱婷婷; 王祥雪; 谭焕新; 侯玉清; 刘双广
Original assignee: Xian University of Electronic Science and Technology; Gosuncn Technology Group Co Ltd
Current assignee: Xian University of Electronic Science and Technology; Gosuncn Technology Group Co Ltd
Priority date: 2019-09-06
Filing date: 2019-09-06
Publication date: 2019-12-20
Anticipated expiration: 2039-09-06
Also published as: CN110602487B

Abstract

The invention belongs to the technical field of video quality detection, and particularly relates to a video frame jitter detection method based on a TSN (time delay network), which comprises the following steps: based on a TSN network structure, extracting a normal optical flow field and a distorted optical flow field by utilizing a TVL1 optical flow algorithm; inputting the normal optical flow field and the distorted optical flow field into a TSN network; and judging whether the video shakes through the TSN network, and outputting the shaken frame number. The video jitter detection based on the TSN network can overcome the defect that the traditional algorithm cannot adapt to the video detection of environmental change and a long time range, and can also keep very high detection performance while reducing the calculation amount; the distorted optical flow proposed in the TSN network can suppress the interference of the motion of people or other things in the video, thereby further making the detection of the jitter more accurate.

Description

Video image jitter detection method based on TSN (time delay network)

Technical Field

The invention belongs to the technical field of video quality detection, and particularly relates to a video frame jitter detection method based on a TSN (time delay network).

Background

With the rapid development of security in safe cities, monitoring systems are widely applied in various fields, and the quality of videos transmitted by the monitoring systems is an important factor influencing whether the monitoring systems can play a role, so that how to maintain the monitoring systems with high efficiency is an urgent problem to be solved in the field of video monitoring. Video image jitter is a fault frequently occurring in a monitoring system, and the video image jitter is generally the up-down, left-right, or up-down, left-right jitter of a video image caused by the infirm fixation of a camera or other external forces and artificial actions, and influences the visual effect. The conventional video image jitter detection method is based on a traditional method, and more methods are a gray projection method and an optical flow method, wherein the gray projection method is an operation for simplifying and extracting image distribution characteristics, and the image characteristics are converted into curves along row and column coordinates by taking pixel rows and columns of a two-dimensional image as units, so that the image distribution characteristics are easier to calculate, but the method is only suitable for video jitter detection under a fixed scene and a simple condition and has relatively poor accuracy; the optical flow method is to extract feature points of a video at first and then track the feature points by using an optical flow algorithm, so that the optical flow is very dependent on the detection quality of the feature points, if a current environment has no way to find more corner points, the estimated displacement is incorrect, if a better effect is required to be obtained, the calculated amount is larger, and the optical flow is very easy to generate wrong estimation on moving objects in the actual environment, the robustness is poor, and the method is not suitable for video jitter detection in a long-time range and a complex environment.

Some existing technologies are basically based on traditional methods, but the generalization effect of traditional algorithms is generally poor, usually a set of threshold values or a set of rules are only applicable to specific scenes, and the accuracy of the algorithms is reduced or even fails when the scenes are changed. However, the application scenarios and environments are varied, which can greatly increase the difficulty of implementing conventional algorithms. For example, jiangei and the like propose a video jitter detection algorithm based on forward-backward optical flow point matching motion entropy, which firstly adopts an orb (oriented FAST and rotated bright) algorithm to extract feature points of a video, and then utilizes a forward-backward optical flow algorithm to track the feature points, wherein the algorithm has strong real-time processing capability, but has two problems: firstly, an optical flow algorithm is sensitive to illumination change and is not suitable for long-time tracking, and the algorithm is not suitable for video jitter detection in a complex environment; and secondly, only judging whether the video has jitter or not, and not outputting the frame number generated by the jitter.

Disclosure of Invention

In order to solve the technical defects in the prior art, the invention provides a video image jitter detection method based on a TSN (time delay network).

The invention is realized by the following technical scheme:

a video image jitter detection method based on TSN network includes steps:

1) based on a TSN network structure, extracting a normal optical flow field and a distorted optical flow field by utilizing a TVL1 optical flow algorithm;

2) inputting the normal optical flow field and the distorted optical flow field into a TSN network;

3) and judging whether the video shakes through the TSN network, and outputting the shaken frame number.

Further, the TSN network is composed of a spatial stream convolution network and a time stream convolution network.

And further, the normal optical flow field and the distorted optical flow field are used as input for capturing motion information, and when too many moving objects exist in the video shot in real time, the motion of the objects is inhibited through the distorted optical flow field, so that the objects are concentrated on the background motion in the video.

Further, in the step 3), the TSN network determining includes: given a video V, dividing V into K segments { S ] at equal intervals₁，S₂，...,S_KAfter that, the TSN network models a series of fragments as follows:

TSN(T₁,T₂,...,T_K)＝H(G(F(T₁；W),F(T₂；W),...,F(T_K；W)))

wherein: (T)₁,T₂,...,T_K) Represents a sequence of fragments, each fragment T_KFrom its corresponding segment S_KObtaining the intermediate random sample; f (T)_K(ii) a W) function represents that the convolutional network using W as a parameter acts on the segment T_KThe function returns T_KA score relative to a jitter category; g () is a segment consensus function; h () is a probability prediction function.

Further, a segment consensus function G () is output in combination with the class scores of the plurality of segments to obtain consensus on a class hypothesis, based on which the probability prediction function H () predicts the probability that the entire video segment belongs to a jitter class; in combination with standard classification cross-entropy losses, the final loss function for partial consensus is of the form:

wherein C is the total category number of behaviors; y is_iIs a grountruth, G of class i_iIs the jitter class score inferred from the scores of the same class in all segments using the aggregation function g.

Preferably, C is 1, a category of dithering.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the video picture jitter detection method based on the TSN network when executing the program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a video picture shaking detection method based on a TSN network.

Compared with the prior art, the invention has at least the following beneficial effects or advantages: the video jitter detection based on the TSN network can overcome the defect that the traditional algorithm cannot adapt to the video detection of environmental change and a long time range, and can also keep very high detection performance while reducing the calculation amount; the distorted optical flow proposed in the TSN network can suppress the interference of the motion of people or other things in the video, thereby further making the detection of the jitter more accurate. The scheme uses the TSN to detect the video jitter, fully utilizes the advantages of the TSN, not only can adapt to environmental changes in any scene, but also can detect videos in a long time range in real time, can inhibit the interference of the motion of other objects in the videos, and has strong anti-interference capability; the method can achieve good detection effect while dealing with the change of different scenes.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings;

FIG. 1 is a block diagram of a TSN network architecture of the present invention;

FIG. 2 is a flow chart of video jitter detection according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a video frame jitter detection method based on a TSN (traffic service network). in one embodiment, the method utilizes a pedestrian recognition principle to detect whether a video jitters, and the behavior recognition is to automatically analyze the ongoing behavior in an unknown video or an image sequence. The method comprises the steps of simple behavior recognition, namely action classification, wherein a video is given, only a plurality of known action classes are required to be correctly classified, complex points are recognized, the video only comprises one action class but a plurality of action classes, the system needs to automatically recognize the action class and the starting moment of the action, and video jitter is also an action behavior from an intuitive angle. Therefore, video jitter can be detected by using a behavior recognition principle, wherein TSN is an algorithm with higher precision in the current behavior recognition and is composed of a spatial stream convolution network and a time stream convolution network, firstly, an input video is divided into k segments, one segment is obtained by corresponding random sampling, the category scores of different segments are fused by adopting a consensus function to generate segment consensus, and then, the prediction fusion of all modes generates a final prediction result; model parameters can be learned from the whole video instead of a short segment, and a sparse time sampling strategy is adopted, wherein the sampling segment only comprises a small part of frames, so that the calculation overhead is greatly reduced, and the network structure diagram of the TSN is shown in figure 1.

The scheme is based on training a TSN network to enable the TSN network to learn video jitter characteristics so as to detect whether pictures are jittered or not. The specific implementation steps are as follows: according to the network structure of the TSN, firstly, the TVL1 optical flow algorithm (TVL1 optical flow algorithm, namely, assuming two continuous frame images I) is utilized₀And I₁X ═ (X, y) is I₀And if the last pixel point is the previous pixel point, the energy function of the optical flow model is as follows: wherein U-is a two-dimensional optical flow field,andis a two-dimensional ladderThe parameter lambda is a weight constant of a data item, the first item is a data constraint item and represents the gray value difference between two frames of images of the same pixel point; the second term is the motion regularization constraint, i.e., the motion is assumed to be continuous. The TVL1 optical flow is calculated by adopting a method for minimizing a total variation optical flow energy function based on a numerical analysis mechanism of image denoising-based bidirectional solving) to extract a normal optical flow field and a distorted optical flow field, the optical flow field is used as input to focus on capturing motion information, but when too many moving objects exist in a video shot in reality, misjudgment is easily caused by the movement of the objects, so that the object motion is inhibited through the distorted optical flow field, and the background motion in the video is concentrated; and inputting the optical flow field into the TSN network, and finally judging whether the video shakes and outputting the shaken frame number by the TSN network. The TSN network is specifically implemented by giving a video V and dividing the video V into K segments { S } at equal intervals₁，S₂，…，S_KAfter that, the TSN models a series of fragments as follows:

TSN(T₁,T₂,...,T_K)＝H(G(F(T₁；W),F(T₂；W),...,F(T_K；W)))

wherein: (T)₁,T₂,...,T_K) Represents a sequence of fragments, each fragment T_KFrom its corresponding segment S_KObtaining the intermediate random sample; f (T)_K(ii) a W) function represents that the convolutional network using W as a parameter acts on a short segment T_KFunction return T_KA score relative to a jitter category; the segment consensus function G combines the class score outputs of the plurality of short segments to obtain a consensus on the class hypothesis, based on which the prediction function H predicts the probability that the whole segment of video belongs to the jitter class; in combination with standard classification cross-entropy losses, the final loss function G for partial consensus is of the form:

where C is the total class number of behaviors, here 1, with only one class dithering, y_iIs the group of category i (true for calibration)Data) using an aggregation function G to infer a jitter class score G from scores of the same class in all segments_iThe aggregation function g represents the final recognition accuracy by a uniform averaging method. The specific flow is shown in fig. 2.

In another embodiment, a computer readable storage medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of a video picture jitter detection method based on a TSN network.

In another embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the video picture jitter detection method based on the TSN network when executing the program.

The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the invention are also within the protection scope of the invention.

Claims

1. A video image jitter detection method based on a TSN network is characterized by comprising the following steps:

2. The TSN network-based video picture jitter detection method of claim 1, wherein the TSN network is composed of a spatial stream convolutional network and a temporal stream convolutional network.

3. The method of claim 1, wherein the normal optical flow field and the distorted optical flow field are used as input for capturing motion information, and when too many moving objects are in the video captured in real time, the distorted optical flow field is used to suppress the motion of the object so as to focus on the background motion in the video.

4. The TSN network-based video picture jitter detection method of claim 1, wherein in said step 3), said TSN network determining comprises: given a video V, dividing V into K segments { S ] at equal intervals₁，S₂，...,S_KAfter that, the TSN network models a series of fragments as follows:

TSN(T₁,T₂,...,T_K)＝H(G(F(T₁；W),F(T₂；W),...,F(T_K；W)))

5. The method of claim 4, wherein the segment consensus function G () is output in combination with the class scores of the plurality of segments to obtain consensus on a class hypothesis, and based on the consensus on the hypothesis, the probability prediction function H () predicts a probability that the entire segment of video belongs to a jitter class; in combination with standard classification cross-entropy losses, the final loss function for partial consensus is of the form:

wherein C is the total category number of behaviors; y is_iIs a grountruth, G of class i_iTo adopt the aggregation function g from all sheetsThe inferred jitter category scores of the same category in the segment.

6. A video frame jitter detection method according to claim 5, wherein C-1 is a category of jitter.

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the TSN network based video picture jitter detection method according to any of claims 1-6.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for detecting video picture jitter according to any of claims 1-6 based on the TSN network.