CN110602487A - Video image jitter detection method based on TSN (time delay network) - Google Patents
Video image jitter detection method based on TSN (time delay network) Download PDFInfo
- Publication number
- CN110602487A CN110602487A CN201910843031.6A CN201910843031A CN110602487A CN 110602487 A CN110602487 A CN 110602487A CN 201910843031 A CN201910843031 A CN 201910843031A CN 110602487 A CN110602487 A CN 110602487A
- Authority
- CN
- China
- Prior art keywords
- video
- optical flow
- tsn
- jitter
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 34
- 230000003287 optical effect Effects 0.000 claims abstract description 42
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 18
- 101100194606 Mus musculus Rfxank gene Proteins 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 25
- 238000000034 method Methods 0.000 claims description 17
- 230000006399 behavior Effects 0.000 claims description 9
- 239000012634 fragment Substances 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 230000002123 temporal effect Effects 0.000 claims 1
- 230000008859 change Effects 0.000 abstract description 4
- 238000004364 calculation method Methods 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract description 3
- 230000007613 environmental effect Effects 0.000 abstract description 3
- 230000009471 action Effects 0.000 description 8
- 238000012544 monitoring process Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of video quality detection, and particularly relates to a video frame jitter detection method based on a TSN (time delay network), which comprises the following steps: based on a TSN network structure, extracting a normal optical flow field and a distorted optical flow field by utilizing a TVL1 optical flow algorithm; inputting the normal optical flow field and the distorted optical flow field into a TSN network; and judging whether the video shakes through the TSN network, and outputting the shaken frame number. The video jitter detection based on the TSN network can overcome the defect that the traditional algorithm cannot adapt to the video detection of environmental change and a long time range, and can also keep very high detection performance while reducing the calculation amount; the distorted optical flow proposed in the TSN network can suppress the interference of the motion of people or other things in the video, thereby further making the detection of the jitter more accurate.
Description
Technical Field
The invention belongs to the technical field of video quality detection, and particularly relates to a video frame jitter detection method based on a TSN (time delay network).
Background
With the rapid development of security in safe cities, monitoring systems are widely applied in various fields, and the quality of videos transmitted by the monitoring systems is an important factor influencing whether the monitoring systems can play a role, so that how to maintain the monitoring systems with high efficiency is an urgent problem to be solved in the field of video monitoring. Video image jitter is a fault frequently occurring in a monitoring system, and the video image jitter is generally the up-down, left-right, or up-down, left-right jitter of a video image caused by the infirm fixation of a camera or other external forces and artificial actions, and influences the visual effect. The conventional video image jitter detection method is based on a traditional method, and more methods are a gray projection method and an optical flow method, wherein the gray projection method is an operation for simplifying and extracting image distribution characteristics, and the image characteristics are converted into curves along row and column coordinates by taking pixel rows and columns of a two-dimensional image as units, so that the image distribution characteristics are easier to calculate, but the method is only suitable for video jitter detection under a fixed scene and a simple condition and has relatively poor accuracy; the optical flow method is to extract feature points of a video at first and then track the feature points by using an optical flow algorithm, so that the optical flow is very dependent on the detection quality of the feature points, if a current environment has no way to find more corner points, the estimated displacement is incorrect, if a better effect is required to be obtained, the calculated amount is larger, and the optical flow is very easy to generate wrong estimation on moving objects in the actual environment, the robustness is poor, and the method is not suitable for video jitter detection in a long-time range and a complex environment.
Some existing technologies are basically based on traditional methods, but the generalization effect of traditional algorithms is generally poor, usually a set of threshold values or a set of rules are only applicable to specific scenes, and the accuracy of the algorithms is reduced or even fails when the scenes are changed. However, the application scenarios and environments are varied, which can greatly increase the difficulty of implementing conventional algorithms. For example, jiangei and the like propose a video jitter detection algorithm based on forward-backward optical flow point matching motion entropy, which firstly adopts an orb (oriented FAST and rotated bright) algorithm to extract feature points of a video, and then utilizes a forward-backward optical flow algorithm to track the feature points, wherein the algorithm has strong real-time processing capability, but has two problems: firstly, an optical flow algorithm is sensitive to illumination change and is not suitable for long-time tracking, and the algorithm is not suitable for video jitter detection in a complex environment; and secondly, only judging whether the video has jitter or not, and not outputting the frame number generated by the jitter.
Disclosure of Invention
In order to solve the technical defects in the prior art, the invention provides a video image jitter detection method based on a TSN (time delay network).
The invention is realized by the following technical scheme:
a video image jitter detection method based on TSN network includes steps:
1) based on a TSN network structure, extracting a normal optical flow field and a distorted optical flow field by utilizing a TVL1 optical flow algorithm;
2) inputting the normal optical flow field and the distorted optical flow field into a TSN network;
3) and judging whether the video shakes through the TSN network, and outputting the shaken frame number.
Further, the TSN network is composed of a spatial stream convolution network and a time stream convolution network.
And further, the normal optical flow field and the distorted optical flow field are used as input for capturing motion information, and when too many moving objects exist in the video shot in real time, the motion of the objects is inhibited through the distorted optical flow field, so that the objects are concentrated on the background motion in the video.
Further, in the step 3), the TSN network determining includes: given a video V, dividing V into K segments { S ] at equal intervals1,S2,...,SKAfter that, the TSN network models a series of fragments as follows:
TSN(T1,T2,...,TK)=H(G(F(T1;W),F(T2;W),...,F(TK;W)))
wherein: (T)1,T2,...,TK) Represents a sequence of fragments, each fragment TKFrom its corresponding segment SKObtaining the intermediate random sample; f (T)K(ii) a W) function represents that the convolutional network using W as a parameter acts on the segment TKThe function returns TKA score relative to a jitter category; g () is a segment consensus function; h () is a probability prediction function.
Further, a segment consensus function G () is output in combination with the class scores of the plurality of segments to obtain consensus on a class hypothesis, based on which the probability prediction function H () predicts the probability that the entire video segment belongs to a jitter class; in combination with standard classification cross-entropy losses, the final loss function for partial consensus is of the form:
wherein C is the total category number of behaviors; y isiIs a grountruth, G of class iiIs the jitter class score inferred from the scores of the same class in all segments using the aggregation function g.
Preferably, C is 1, a category of dithering.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the video picture jitter detection method based on the TSN network when executing the program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a video picture shaking detection method based on a TSN network.
Compared with the prior art, the invention has at least the following beneficial effects or advantages: the video jitter detection based on the TSN network can overcome the defect that the traditional algorithm cannot adapt to the video detection of environmental change and a long time range, and can also keep very high detection performance while reducing the calculation amount; the distorted optical flow proposed in the TSN network can suppress the interference of the motion of people or other things in the video, thereby further making the detection of the jitter more accurate. The scheme uses the TSN to detect the video jitter, fully utilizes the advantages of the TSN, not only can adapt to environmental changes in any scene, but also can detect videos in a long time range in real time, can inhibit the interference of the motion of other objects in the videos, and has strong anti-interference capability; the method can achieve good detection effect while dealing with the change of different scenes.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings;
FIG. 1 is a block diagram of a TSN network architecture of the present invention;
FIG. 2 is a flow chart of video jitter detection according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a video frame jitter detection method based on a TSN (traffic service network). in one embodiment, the method utilizes a pedestrian recognition principle to detect whether a video jitters, and the behavior recognition is to automatically analyze the ongoing behavior in an unknown video or an image sequence. The method comprises the steps of simple behavior recognition, namely action classification, wherein a video is given, only a plurality of known action classes are required to be correctly classified, complex points are recognized, the video only comprises one action class but a plurality of action classes, the system needs to automatically recognize the action class and the starting moment of the action, and video jitter is also an action behavior from an intuitive angle. Therefore, video jitter can be detected by using a behavior recognition principle, wherein TSN is an algorithm with higher precision in the current behavior recognition and is composed of a spatial stream convolution network and a time stream convolution network, firstly, an input video is divided into k segments, one segment is obtained by corresponding random sampling, the category scores of different segments are fused by adopting a consensus function to generate segment consensus, and then, the prediction fusion of all modes generates a final prediction result; model parameters can be learned from the whole video instead of a short segment, and a sparse time sampling strategy is adopted, wherein the sampling segment only comprises a small part of frames, so that the calculation overhead is greatly reduced, and the network structure diagram of the TSN is shown in figure 1.
The scheme is based on training a TSN network to enable the TSN network to learn video jitter characteristics so as to detect whether pictures are jittered or not. The specific implementation steps are as follows: according to the network structure of the TSN, firstly, the TVL1 optical flow algorithm (TVL1 optical flow algorithm, namely, assuming two continuous frame images I) is utilized0And I1X ═ (X, y) is I0And if the last pixel point is the previous pixel point, the energy function of the optical flow model is as follows: wherein U-is a two-dimensional optical flow field,andis a two-dimensional ladderThe parameter lambda is a weight constant of a data item, the first item is a data constraint item and represents the gray value difference between two frames of images of the same pixel point; the second term is the motion regularization constraint, i.e., the motion is assumed to be continuous. The TVL1 optical flow is calculated by adopting a method for minimizing a total variation optical flow energy function based on a numerical analysis mechanism of image denoising-based bidirectional solving) to extract a normal optical flow field and a distorted optical flow field, the optical flow field is used as input to focus on capturing motion information, but when too many moving objects exist in a video shot in reality, misjudgment is easily caused by the movement of the objects, so that the object motion is inhibited through the distorted optical flow field, and the background motion in the video is concentrated; and inputting the optical flow field into the TSN network, and finally judging whether the video shakes and outputting the shaken frame number by the TSN network. The TSN network is specifically implemented by giving a video V and dividing the video V into K segments { S } at equal intervals1,S2,…,SKAfter that, the TSN models a series of fragments as follows:
TSN(T1,T2,...,TK)=H(G(F(T1;W),F(T2;W),...,F(TK;W)))
wherein: (T)1,T2,...,TK) Represents a sequence of fragments, each fragment TKFrom its corresponding segment SKObtaining the intermediate random sample; f (T)K(ii) a W) function represents that the convolutional network using W as a parameter acts on a short segment TKFunction return TKA score relative to a jitter category; the segment consensus function G combines the class score outputs of the plurality of short segments to obtain a consensus on the class hypothesis, based on which the prediction function H predicts the probability that the whole segment of video belongs to the jitter class; in combination with standard classification cross-entropy losses, the final loss function G for partial consensus is of the form:
where C is the total class number of behaviors, here 1, with only one class dithering, yiIs the group of category i (true for calibration)Data) using an aggregation function G to infer a jitter class score G from scores of the same class in all segmentsiThe aggregation function g represents the final recognition accuracy by a uniform averaging method. The specific flow is shown in fig. 2.
In another embodiment, a computer readable storage medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of a video picture jitter detection method based on a TSN network.
In another embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the video picture jitter detection method based on the TSN network when executing the program.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the invention are also within the protection scope of the invention.
Claims (8)
1. A video image jitter detection method based on a TSN network is characterized by comprising the following steps:
1) based on a TSN network structure, extracting a normal optical flow field and a distorted optical flow field by utilizing a TVL1 optical flow algorithm;
2) inputting the normal optical flow field and the distorted optical flow field into a TSN network;
3) and judging whether the video shakes through the TSN network, and outputting the shaken frame number.
2. The TSN network-based video picture jitter detection method of claim 1, wherein the TSN network is composed of a spatial stream convolutional network and a temporal stream convolutional network.
3. The method of claim 1, wherein the normal optical flow field and the distorted optical flow field are used as input for capturing motion information, and when too many moving objects are in the video captured in real time, the distorted optical flow field is used to suppress the motion of the object so as to focus on the background motion in the video.
4. The TSN network-based video picture jitter detection method of claim 1, wherein in said step 3), said TSN network determining comprises: given a video V, dividing V into K segments { S ] at equal intervals1,S2,...,SKAfter that, the TSN network models a series of fragments as follows:
TSN(T1,T2,...,TK)=H(G(F(T1;W),F(T2;W),...,F(TK;W)))
wherein: (T)1,T2,...,TK) Represents a sequence of fragments, each fragment TKFrom its corresponding segment SKObtaining the intermediate random sample; f (T)K(ii) a W) function represents that the convolutional network using W as a parameter acts on the segment TKThe function returns TKA score relative to a jitter category; g () is a segment consensus function; h () is a probability prediction function.
5. The method of claim 4, wherein the segment consensus function G () is output in combination with the class scores of the plurality of segments to obtain consensus on a class hypothesis, and based on the consensus on the hypothesis, the probability prediction function H () predicts a probability that the entire segment of video belongs to a jitter class; in combination with standard classification cross-entropy losses, the final loss function for partial consensus is of the form:
wherein C is the total category number of behaviors; y isiIs a grountruth, G of class iiTo adopt the aggregation function g from all sheetsThe inferred jitter category scores of the same category in the segment.
6. A video frame jitter detection method according to claim 5, wherein C-1 is a category of jitter.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the TSN network based video picture jitter detection method according to any of claims 1-6.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for detecting video picture jitter according to any of claims 1-6 based on the TSN network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910843031.6A CN110602487B (en) | 2019-09-06 | 2019-09-06 | Video image jitter detection method based on TSN (time delay network) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910843031.6A CN110602487B (en) | 2019-09-06 | 2019-09-06 | Video image jitter detection method based on TSN (time delay network) |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110602487A true CN110602487A (en) | 2019-12-20 |
CN110602487B CN110602487B (en) | 2021-04-20 |
Family
ID=68858068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910843031.6A Active CN110602487B (en) | 2019-09-06 | 2019-09-06 | Video image jitter detection method based on TSN (time delay network) |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110602487B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116193231A (en) * | 2022-10-24 | 2023-05-30 | 成都与睿创新科技有限公司 | Method and system for handling minimally invasive surgical field anomalies |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001006181A (en) * | 1999-05-07 | 2001-01-12 | Sony Precision Eng Center Singapore Pte Ltd | Apparatus for measuring jitter of optical disc |
US20100157070A1 (en) * | 2008-12-22 | 2010-06-24 | Honeywell International Inc. | Video stabilization in real-time using computationally efficient corner detection and correspondence |
CN104135597A (en) * | 2014-07-04 | 2014-11-05 | 上海交通大学 | Automatic detection method of jitter of video |
CN108492287A (en) * | 2018-03-14 | 2018-09-04 | 罗普特(厦门)科技集团有限公司 | A kind of video jitter detection method, terminal device and storage medium |
CN110191320A (en) * | 2019-05-29 | 2019-08-30 | 合肥学院 | Video jitter based on pixel timing motion analysis and freeze detection method and device |
-
2019
- 2019-09-06 CN CN201910843031.6A patent/CN110602487B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001006181A (en) * | 1999-05-07 | 2001-01-12 | Sony Precision Eng Center Singapore Pte Ltd | Apparatus for measuring jitter of optical disc |
US20100157070A1 (en) * | 2008-12-22 | 2010-06-24 | Honeywell International Inc. | Video stabilization in real-time using computationally efficient corner detection and correspondence |
CN104135597A (en) * | 2014-07-04 | 2014-11-05 | 上海交通大学 | Automatic detection method of jitter of video |
CN108492287A (en) * | 2018-03-14 | 2018-09-04 | 罗普特(厦门)科技集团有限公司 | A kind of video jitter detection method, terminal device and storage medium |
CN110191320A (en) * | 2019-05-29 | 2019-08-30 | 合肥学院 | Video jitter based on pixel timing motion analysis and freeze detection method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116193231A (en) * | 2022-10-24 | 2023-05-30 | 成都与睿创新科技有限公司 | Method and system for handling minimally invasive surgical field anomalies |
Also Published As
Publication number | Publication date |
---|---|
CN110602487B (en) | 2021-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10769480B2 (en) | Object detection method and system | |
Ruiz et al. | Fine-grained head pose estimation without keypoints | |
WO2020173226A1 (en) | Spatial-temporal behavior detection method | |
US20220417590A1 (en) | Electronic device, contents searching system and searching method thereof | |
US9767570B2 (en) | Systems and methods for computer vision background estimation using foreground-aware statistical models | |
US8218819B2 (en) | Foreground object detection in a video surveillance system | |
US8218818B2 (en) | Foreground object tracking | |
US20170213105A1 (en) | Method and apparatus for event sampling of dynamic vision sensor on image formation | |
TWI482123B (en) | Multi-state target tracking mehtod and system | |
CN106851049B (en) | A kind of scene alteration detection method and device based on video analysis | |
US7982774B2 (en) | Image processing apparatus and image processing method | |
KR102002812B1 (en) | Image Analysis Method and Server Apparatus for Detecting Object | |
CN112561951B (en) | Motion and brightness detection method based on frame difference absolute error and SAD | |
CN110633643A (en) | Abnormal behavior detection method and system for smart community | |
CN110602487B (en) | Video image jitter detection method based on TSN (time delay network) | |
CN108876807B (en) | Real-time satellite-borne satellite image moving object detection tracking method | |
CN114049483A (en) | Target detection network self-supervision training method and device based on event camera | |
Wu et al. | A novel visual object detection and distance estimation method for hdr scenes based on event camera | |
CN111784750A (en) | Method, device and equipment for tracking moving object in video image and storage medium | |
JPWO2018179119A1 (en) | Video analysis device, video analysis method, and program | |
TWI783572B (en) | Object tracking method and object tracking apparatus | |
CN115512263A (en) | Dynamic visual monitoring method and device for falling object | |
CN114972840A (en) | Momentum video target detection method based on time domain relation | |
WO2016136214A1 (en) | Identifier learning device, remaining object detection system, identifier learning method, remaining object detection method, and program recording medium | |
Takahara et al. | Making background subtraction robust to various illumination changes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |