CN116665101B - Method for extracting key frames of monitoring video based on contourlet transformation - Google Patents
Method for extracting key frames of monitoring video based on contourlet transformation Download PDFInfo
- Publication number
- CN116665101B CN116665101B CN202310625992.6A CN202310625992A CN116665101B CN 116665101 B CN116665101 B CN 116665101B CN 202310625992 A CN202310625992 A CN 202310625992A CN 116665101 B CN116665101 B CN 116665101B
- Authority
- CN
- China
- Prior art keywords
- key frame
- monitoring video
- image
- information
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000009466 transformation Effects 0.000 title claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 24
- 230000002708 enhancing effect Effects 0.000 claims abstract description 13
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 12
- 230000008859 change Effects 0.000 claims abstract description 11
- 230000004927 fusion Effects 0.000 claims abstract description 7
- 230000033001 locomotion Effects 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 5
- 238000005520 cutting process Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 abstract description 4
- 238000005286 illumination Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/478—Contour-based spectral representations or scale-space representations, e.g. by Fourier analysis, wavelet analysis or curvature scale-space [CSS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20192—Edge enhancement; Edge preservation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method for extracting a monitoring video key frame based on contourlet transformation; the method comprises the following steps: carrying out multi-scale and multi-directional decomposition on the video sequence by utilizing the profile wave transformation to obtain an image containing rich direction information and profile information; providing a non-downsampling direction filter combination, filtering and decomposing different direction features, and carrying out inverse contour wave transformation fusion on the decomposed features under different scales to obtain a non-photosensitive contour feature map; constructing a texture enhancement model by using a nonlinear enhancement function, wherein the texture enhancement model is used for enhancing image edge information and enhancing the contrast of a target contour; the method solves the problems that the current method is easily influenced by external illumination condition change and the target direction detail information is not extracted in place, and effectively improves the precision of the key frame extraction method facing the monitoring video field.
Description
Technical Field
The invention relates to a method for extracting a monitoring video key frame based on contourlet transformation, and belongs to the technical field of computer vision.
Background
In recent years, networks have become popular in daily life, and with the continuous development of internet media technology, network information transmission costs have been reduced. Compared with the traditional communication modes such as short messages, characters and the like, people tend to communicate in a more visual mode such as video, and the continuous development of video technology enriches the communication modes of people.
While the development of emerging technologies is continually bringing convenience to people, video data inevitably presents a trend toward blowout growth. In the field of monitoring video, the manufacturing cost of hardware equipment such as a camera is gradually reduced, so that the video can be deployed at all corners of a city, and meanwhile, a large amount of video data which cannot be processed in time is inevitably generated due to the uninterrupted working mode of the video. Intelligent monitoring video systems are involved in hunting in many areas of people's daily lives.
In the intelligent office field, the monitoring video system has strong practicability in the aspects of production regulation and control, large-scale public facility safety guarantee, remote education and the like. From the viewpoint of urban construction, a powerful technical support is provided for government to promote construction informationized public security protection engineering. In the field of traffic management, the video monitoring system plays an irreplaceable role, and a monitoring camera installed at a traffic junction such as an intersection, a high-speed toll gate and the like can capture the illegal behaviors of motor vehicles, so that traffic accidents are prevented to a certain extent. The monitoring system in the intelligent home field fully plays the self value, and the outdoor monitoring equipment can be used for detecting whether strangers exist at the gate, so that the house safety problem of people is guaranteed; the indoor monitoring equipment can dynamically know and master the condition of the home in real time, and a good living environment is provided for people.
The intelligent monitoring system plays a very important role in various fields, however, the existing monitoring system has the problems that massive data cannot be completely stored, the process of retrieving key information is complex, the labor cost is too high and the like. Key frames are typically formed as a collection of frames of an image or video, and are a collection of images that represent the most significant content in a video. Key frame extraction is a technique that studies how to reflect the primary content of the original video to the greatest extent using the smallest sequence of still images. As a technology capable of reducing redundancy of original video data and selecting core contents therefrom to be recombined into a visual abstract, key frame extraction has attracted extensive attention from domestic and foreign students. Therefore, the technology of extracting static abstracts from video data with complex contents and huge data volume and concentrating the extracted static abstracts into one frame or a plurality of video frame sequence sets capable of completely expressing video contents becomes a current research hot spot. The realization of the technology is beneficial to reducing the storage pressure of massive videos, reducing the workload of video abstract retrieval, and further saving the labor and financial cost.
Although the video retrieval technology is continuously updated and iterated, people can not efficiently and accurately acquire valuable video information in the face of massive video data. The capture characteristics of surveillance video, which typically consist of long, undipped image sequences, result in a significant amount of information redundancy in the video, which often consists of blank content in the original video where no objects appear. Meanwhile, the existing key frame extraction technology of the monitoring video still has a certain limitation, and when the illumination condition changes, the sudden change of the brightness of the monitoring video can cause the reduction of the key frame extraction efficiency. In addition, the existing key frame extraction method also has the problem that the target direction detail information in the monitoring video is not extracted in place. In order to solve the problem, a monitoring video key frame extraction method based on contourlet transformation is provided. Firstly, according to the time-frequency localization analysis capability of the profile wave transformation, carrying out multi-scale and multi-directional decomposition on a video sequence by adopting the profile wave transformation, thereby obtaining an image containing rich directions and profile information; then, a non-downsampling direction filter combination is provided, and a non-photosensitive profile characteristic diagram is obtained by filtering and fusing different direction characteristics; then, constructing a texture enhancement model by using a nonlinear enhancement function, and achieving the purpose of improving the precision of a key frame extraction method from the viewpoint of enhancing the image edge information capability; and finally, constructing a key frame screening model based on structural similarity to select and extract key frames.
Disclosure of Invention
A key frame extraction method based on contourlet transformation for the field of surveillance videos is characterized by at least comprising the following steps:
s1: collecting a monitoring video from an intelligent monitoring video system, and obtaining a video data set;
s2: cutting the monitoring video, performing graying pretreatment and the like to obtain a monitoring video sequence, and adjusting the monitoring video sequence to a preset resolution;
s3: carrying out multi-scale multi-direction decomposition on the processed monitoring video sequence so as to obtain an image containing rich directions and contour information;
s4: selecting an image high-frequency component containing image edge part information to perform non-downsampling direction filter combination decomposition so as to obtain different band-pass direction sub-bands, and performing contour wave reconstruction fusion on the band-pass direction sub-bands under different scales so as to obtain a non-photosensitive contour feature map;
s5: the texture enhancement model is used for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image;
s6: calculating the structural similarity of two adjacent frames and forming a structural similarity curve to replace a real lens dynamic curve, and replacing inflection points in the lens with minimum value points of the lens similarity curve;
s7: all minima in the structural similarity curve are put into the set and named keyframe set.
Further, preprocessing the monitoring video and converting the multi-scale multi-directional profile wave, including:
the contourlet transform is a discrete transform method specific to two-dimensional images that can efficiently process graphic geometries that contain contour information. One of the biggest features of the contourlet, which is different from other transformation methods, is the flexible multi-resolution and the different directionality, which means that it allows different directions within different scales. A non-downsampled contourlet transform is a contourlet transform that has the property of being translationally invariant, typically consisting of a non-downsampled pyramid filter and a non-downsampled direction filter.
The proposed algorithm first performs a non-downsampling pyramid decomposition on the surveillance video frame sequence, thereby obtaining high frequency components and low frequency components of the image. Because the high-frequency component of the image often contains the information of strong image change, the invention selects the high-frequency component containing the information of the image edge part to carry out the combination decomposition of the non-downsampling direction filter so as to obtain different bandpass direction sub-bands.
Next, the non-downsampled pyramid decomposition is performed again on the low frequency component containing the image gray information, and the band pass directional subbands are obtained by performing the above-described repetitive operation on the high frequency component therein.
And then, carrying out contour wave reconstruction fusion on the sub-bands in the band-pass directions under different scales, thereby obtaining a non-photosensitive contour feature map.
Instead of the non-downsampling direction filter in the conventional method, a non-downsampling direction filter combination is used, which combines a quadrature filter, a smooth quadrature filter and a two-dimensional filter based on the McClellan transform.
Further, a texture enhancement model is designed for contour texture enhancement of a non-photosensitive contour feature map, comprising:
because the non-downsampling direction filter combination is used for filtering and acquiring information aiming at high-frequency components of an image, the extracted non-photosensitive contour feature map has the problems of poor image quality and unobvious gray contrast.
Therefore, the invention designs a texture enhancement model for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image. The invention adopts a nonlinear stretching mode to improve the gray contrast of the non-photosensitive profile characteristic diagram.
And carrying out coefficient enhancement on the extracted high-frequency component according to a nonlinear enhancement function so as to achieve the purpose of increasing the contrast of contour texture information in an image, wherein the nonlinear enhancement function is shown as follows:
wherein x represents the gray value of the photosensitive contour feature map before texture enhancement construction, gamma is the final gray value result, r is the parameter for slope control of the nonlinear enhancement function, and alpha has a certain influence on the change of the slope.
According to the formula, if the slope is too high, the contrast of the gray information of the image is too high, and the image is distorted. Otherwise, the problem that the image light and dark contrast is not high, and the enhancement effect of the texture enhancement model is not obvious is caused. In view of the above, the parameter is defined as α=r=0.5 in the present invention.
Further, a key frame screening criterion based on structural similarity is designed, which comprises the following steps:
since motion is a nonlinear process and the dimensions of the image data are high, the time complexity of computing the high dynamic curve is high. Therefore, a linearization description of the target motion process is required to achieve fast and accurate key frame extraction.
It is noted that there are various methods for linearizing the description to reduce the time complexity of the high dynamic curve, and the proposed method uses the structural similarity curve to replace the real lens dynamic curve and uses the minimum point of the lens similarity curve to replace the inflection point in the lens.
In general, the overall structure information of an image is characterized by structure information, and in addition, gray information and brightness information of the image have a great influence on the observation of the image. Such information is often of interest when viewing images.
Assuming that a set of video sequences is S { s=1, 2 …, n }, where x and y represent two adjacent video frames, the brightness, contrast and structural similarity are defined as follows:
wherein mu x ,μ y Respectively the average value of the images x and y, reflecting the brightness information thereof; sigma (sigma) x ,σ y The variances of the images x and y reflect the contrast information; sigma (sigma) xy The correlation coefficients of x and y reflect the similarity of the structural information; c (C) 1 ,C 2 ,C 3 The normal number is close to zero, and abnormal results caused by zero denominator are prevented. The structural similarity of x and y obtained by combining these three information is shown below:
SSIM(x,y)=[l(x,y)] α ·[c(x,y)] β ·[s(x,y)] γ
wherein, α, β, γ are all greater than 0, and are used to adjust the relative importance of the three information, and when α=β=γ, the method can be simplified as:
however, due to the simplification, the minimum value point screened is not completely equivalent to the inflection point of the motion track of the target, so that iteration and optimization are required. The procedure is as follows.
Step 1: and extracting a complete video frame sequence of the monitoring video. The method comprises the steps of firstly cutting an input monitoring video sequence, so as to obtain a complete monitoring video frame sequence.
Step 2: and calculating the structural similarity of two adjacent frames. Structural similarity of two adjacent frames in the set of video sequences S is calculated according to equation (5-6) and stored as defined below as S SSIM Of the sets, set S SSIM The definition is as follows:
(2,SSIM(1,2))∪(3,SSIM(2,3))∪...∪(n,SSIM(n-1,n))
step 3: forming a structural similarity curve. The structural similarity curve (Structural Similarity Curve, SSIMC) is based on the set S formed by the structural similarity of two adjacent frames in step 2 SSIM Drawing toAnd (3) forming the finished product.
Step 4: a final keyframe set is formed. This method places all minima in the SSIMC into set Z and names them as candidate keyframe sets. Since the number of points K of the minimum value of the curve does not necessarily meet the required key frame number K 0 The key frame secondary extraction is performed according to the following rule.
(1) If K<K 0 All key frames in the set Z are first extracted as final key frames and stored in the final key frame set F. K is then added using a linear interpolation method 0 K frames and store them into the set F.
(2) If K=K 0 And directly extracting the key frames in the candidate key frame set Z to serve as final key frames, and storing the final key frames in the final key frame set F.
(3) If K>K 0 The peak signal-to-noise ratio (Peak Signal Noise Ratio, PSNR) of adjacent frames in the set of video sequences S is first calculated. Next, an average peak signal-to-noise ratio PSNR of two adjacent frames is calculated avg And comparing with key frames in candidate key frame set Z if the corresponding frame difference peak signal-to-noise ratio is greater than PSNR avg It is deleted from the candidate keyframe set and finally the extraction of the K-frame keyframes is completed. PSNR (Power System noise ratio) avg The calculation formula of (2) is shown as follows:
drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a frame flow chart of a method for extracting key frames of a surveillance video based on contourlet transformation;
FIG. 2 is a schematic diagram of a structure of a contourlet transformation model according to the present invention;
FIG. 3 is a schematic diagram of a texture enhancement model according to the present invention;
FIG. 4 is a schematic diagram of a video keyframe screening model based on structural similarity according to the present invention;
FIG. 5 is a diagram of different filter characteristics;
FIG. 6 is a diagram showing the effectiveness test effect of the present invention in FIG. 1;
fig. 7 shows the effectiveness test effect of the present invention in fig. 2.
Detailed Description
The technical solutions of the embodiments of the present invention will be further described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a frame flow chart of a method for extracting key frames of surveillance video based on contourlet transformation according to a first embodiment of the present invention includes:
s1: collecting a monitoring video from an intelligent monitoring video system, and obtaining a video data set;
s2: cutting the monitoring video, performing graying pretreatment and the like to obtain a monitoring video sequence, and adjusting the monitoring video sequence to a preset resolution;
s3: carrying out multi-scale multi-direction decomposition on the processed monitoring video sequence so as to obtain an image containing rich directions and contour information;
s4: selecting an image high-frequency component containing image edge part information to perform non-downsampling direction filter combination decomposition so as to obtain different band-pass direction sub-bands, and performing contour wave reconstruction fusion on the band-pass direction sub-bands under different scales so as to obtain a non-photosensitive contour feature map;
the algorithm provided by the invention firstly carries out non-downsampling pyramid decomposition on the monitoring video frame sequence, thereby obtaining the high-frequency component and the low-frequency component of the image.
Because the high-frequency component of the image often contains the information of strong image change, the invention selects the high-frequency component containing the information of the image edge part to carry out the combination decomposition of the non-downsampling direction filter so as to obtain different bandpass direction sub-bands.
Repeating the operation again for the low frequency component containing the image gray information, and adopting the repeated operation for the high frequency component to obtain the band-pass direction sub-band.
And then, carrying out contour wave reconstruction fusion on the sub-bands in the band-pass directions under different scales, thereby obtaining a non-photosensitive contour feature map.
The present invention uses a combination of non-downsampling direction filters in place of the non-downsampling direction filters in the conventional approach, which combines a quadrature filter, a smooth quadrature filter, and a two-dimensional filter based on the McClellan transform. Fig. 4 is a feature map of an original image after filtering based on different filters.
S5: the texture enhancement model is used for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image;
therefore, the invention designs a texture enhancement model for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image. The invention adopts a nonlinear stretching mode to improve the gray contrast of the non-photosensitive profile characteristic diagram.
And carrying out coefficient enhancement on the extracted high-frequency component according to a nonlinear enhancement function so as to achieve the purpose of increasing the contrast of contour texture information in an image, wherein the nonlinear enhancement function is shown as follows:
wherein x represents the gray value of the photosensitive contour feature map before texture enhancement construction, gamma is the final gray value result, r is the parameter for slope control of the nonlinear enhancement function, and alpha has a certain influence on the change of the slope.
According to the formula, if the slope is too high, the contrast of the gray information of the image is too high, and the image is distorted. Otherwise, the problem that the image light and dark contrast is not high, and the enhancement effect of the texture enhancement model is not obvious is caused. In view of the above, the parameter is defined as α=r=0.5 in the present invention.
S6: calculating the structural similarity of two adjacent frames and forming a structural similarity curve to replace a real lens dynamic curve, and replacing inflection points in the lens with minimum value points of the lens similarity curve;
s7: all minima in the structural similarity curve are put into the set and named keyframe set.
In general, the overall structure information of an image is characterized by structure information, and in addition, gray information and brightness information of the image have a great influence on the observation of the image. Such information is often of interest when viewing images.
Assuming that a set of video sequences is S { s=1, 2 …, n }, where x and y represent two adjacent video frames, the brightness, contrast and structural similarity are defined as follows:
wherein mu x ,μ y Respectively the average value of the images x and y, reflecting the brightness information thereof; sigma (sigma) x ,σ y The variances of the images x and y reflect the contrast information; sigma (sigma) xy The correlation coefficients of x and y reflect the similarity of the structural information; c (C) 1 ,C 2 ,C 3 Is a normal number close to zero, prevents the denominator from being zeroWhich leads to abnormal results. The structural similarity of x and y obtained by combining these three information is shown below:
SSIM(x,y)=[l(x,y)] α ·[c(x,y)] β ·[s(x,y)] γ
wherein, α, β, γ are all greater than 0, and are used to adjust the relative importance of the three information, and when α=β=γ, the method can be simplified as:
however, due to the simplification, the minimum value point screened is not completely equivalent to the inflection point of the motion track of the target, so that iteration and optimization are required. The procedure is as follows.
Step 1: and extracting a complete video frame sequence of the monitoring video. The method comprises the steps of firstly cutting an input monitoring video sequence, so as to obtain a complete monitoring video frame sequence.
Step 2: and calculating the structural similarity of two adjacent frames. Structural similarity of two adjacent frames in the set of video sequences S is calculated according to equation (5-6) and stored as defined below as S SSIM Of the sets, set S SSIM The definition is as follows:
(2,SSIM(1,2))∪(3,SSIM(2,3))∪...∪(n,SSIM(n-1,n))
step 3: forming a structural similarity curve. The structural similarity curve (Structural Similarity Curve, SSIMC) is based on the set S formed by the structural similarity of two adjacent frames in step 2 SSIM Drawing to obtain the final product.
Step 4: a final keyframe set is formed. This method places all minima in the SSIMC into set Z and names them as candidate keyframe sets. Since the number of points K of the minimum value of the curve does not necessarily meet the required key frame number K 0 The key frame secondary extraction is performed according to the following rule.
(1) If K<K 0 All key frames in the set Z are first extracted as final key frames and stored in the final key frame set F. K is then added using a linear interpolation method 0 K frames and store them into the set F.
(2) If K=K 0 And directly extracting the key frames in the candidate key frame set Z to serve as final key frames, and storing the final key frames in the final key frame set F.
(3) If K>K 0 The peak signal-to-noise ratio (Peak Signal Noise Ratio, PSNR) of adjacent frames in the set of video sequences S is first calculated. Next, an average peak signal-to-noise ratio PSNR of two adjacent frames is calculated avg And comparing with key frames in candidate key frame set Z if the corresponding frame difference peak signal-to-noise ratio is greater than PSNR avg It is deleted from the candidate keyframe set and finally the extraction of the K-frame keyframes is completed. PSNR (Power System noise ratio) avg The calculation formula of (2) is shown in the following formula (5-8):
an embodiment of the invention provides a monitoring video key frame extraction method terminal device based on contourlet transformation, which comprises one or more input devices, one or more output devices, one or more processors and a memory, wherein the memory is used for storing a computer program, and the processor is used for executing the computer program to realize the target detection method facing an unmanned aerial vehicle platform.
An embodiment of the present invention provides a computer readable storage medium storing a computer program, where the computer program when executed by a processor performs the above-mentioned method for extracting keyframes of surveillance video based on contourlet transformation.
To verify the validity of the above embodiment, we compare the present invention with the advanced approach in the key frame extraction approach by calculating the accuracy, recall, and F1 score. Specifically, we use the data set Visor containing rich local motion information and the real world data set abnormal situation data set UCF-Crime. The Visor dataset contains various human activities such as walking, jumping, placing objects and the like, and the human activities are used for judging the extraction capability of the method provided by the chapter to the target detail information. The data set UCF-Crime covers a total of seven real world anomalies. The above-mentioned abnormal behaviors are chosen because they affect the normal life of the real world and thus require accurate key frame extraction.
The present invention will use a tone scale map to represent the precision, recall, and F1 score of each video in both datasets. From table 1 it can be seen that the recall of the proposed method is significantly better than the comparative method, with significant advantages. And meanwhile, the F1 fraction is improved relative to a comparison method. It can be seen from table 2 that the advantages of the proposed method still manifest themselves in the improvement of recall. Meanwhile, the manual destruction process is described in Video17 and Video18, and it can be seen that the method provided by the invention is significantly better than the comparison method in terms of accuracy, recall and F1 fraction. The fire disaster is often accompanied with sudden change of brightness in the monitoring video lens, however, the key frame extraction method for comparison is easily affected by factors such as illumination condition change, and key frames with sudden change of brightness cannot be accurately captured, so that the extraction accuracy is reduced. However, the key frame extraction technology based on the method provided by the invention generates a non-photosensitive outline feature map through fusion of multiple filters, so that the video with abrupt brightness change of the lens in the face of fire and the like has higher accuracy. In summary, the accuracy and recall rate of the key frames extracted by the method of the present invention are better than those of the comparison method, which further proves the effectiveness of the method of the present invention. Meanwhile, the generalization of the method provided by the invention is also verified.
Table 1 comparison of the results of the different methods of data set 1
Table 2 comparison of results for the different methods of dataset 2
Claims (2)
1. The key frame extraction method for the monitoring video field is characterized by at least comprising the following steps:
s1: collecting a monitoring video from an intelligent monitoring video system, and obtaining a video data set;
s2: cutting the monitoring video, performing graying pretreatment and the like to obtain a monitoring video sequence, and adjusting the monitoring video sequence to a preset resolution;
s3: carrying out multi-scale multi-direction decomposition on the processed monitoring video sequence by utilizing the profile wave transformation, thereby obtaining an image containing rich directions and profile information; the specific operation is as follows: selecting an image high-frequency component containing image edge part information, decomposing the image high-frequency component by using a non-downsampling direction filter combination to replace a traditional non-downsampling direction filter, so as to obtain different band-pass direction sub-bands, and carrying out contour wave reconstruction fusion on the band-pass direction sub-bands under different scales, so as to obtain a non-photosensitive contour feature map; the non-downsampling direction filter combination comprises a quadrature filter, a smooth quadrature filter and a two-dimensional filter based on Maclen transformation;
s4: the texture enhancement model is used for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image;
s5: calculating the structural similarity of two adjacent frames and forming a structural similarity curve to replace a real lens dynamic curve, replacing inflection points in the lens with minimum value points of the structural similarity curve, putting the inflection points into a set Z, and naming the inflection points as a key frame set; the method aims at carrying out linear description on the target motion trail so as to reduce the time complexity of a high-dynamic curve;
s6: performing secondary key frame extraction on the key frame set to ensure the number of curve minimum point numbers K and the required key frame number K 0 The specific steps are as follows: if K<K 0 All key frames in the set Z are firstly extracted as final key frames and stored in the final key frame setIn F, K is then added using a linear interpolation method 0 -K frames and store them into the set F; if K=K 0 Directly extracting the key frames in the candidate key frame set Z to be used as final key frames, and storing the final key frames in a final key frame set F; if K > K 0 Then, the peak signal-to-noise ratio PSNR of the adjacent frames in the video sequence set S is calculated first, and then, the average peak signal-to-noise ratio PSNR of the two adjacent frames is calculated avg And comparing with key frames in candidate key frame set Z if the corresponding frame difference peak signal-to-noise ratio is greater than PSNR avg It is deleted from the candidate keyframe set and finally the extraction of the K-frame keyframes is completed.
2. The texture enhancement model of claim 1, wherein the non-linear stretching is used to improve the gray contrast of the non-photosensitive profile; and carrying out coefficient enhancement on the extracted high-frequency component according to a nonlinear enhancement function so as to achieve the purpose of increasing the contrast of contour texture information in an image, wherein the nonlinear enhancement function is shown as follows:
wherein x represents the gray value of the photosensitive contour feature map before texture enhancement construction, gamma is the final gray value result, r is the parameter for slope control of the nonlinear enhancement function, and alpha has a certain influence on the change of the slope.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310625992.6A CN116665101B (en) | 2023-05-30 | 2023-05-30 | Method for extracting key frames of monitoring video based on contourlet transformation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310625992.6A CN116665101B (en) | 2023-05-30 | 2023-05-30 | Method for extracting key frames of monitoring video based on contourlet transformation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116665101A CN116665101A (en) | 2023-08-29 |
CN116665101B true CN116665101B (en) | 2024-01-26 |
Family
ID=87716588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310625992.6A Active CN116665101B (en) | 2023-05-30 | 2023-05-30 | Method for extracting key frames of monitoring video based on contourlet transformation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116665101B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117934574A (en) * | 2024-03-22 | 2024-04-26 | 深圳市智兴盛电子有限公司 | Method, device, equipment and storage medium for optimizing image of automobile data recorder |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077749A (en) * | 2014-06-17 | 2014-10-01 | 长江大学 | Seismic data denoising method based on contourlet transformation |
CN106210444A (en) * | 2016-07-04 | 2016-12-07 | 石家庄铁道大学 | Kinestate self adaptation key frame extracting method |
CN109978802A (en) * | 2019-02-13 | 2019-07-05 | 中山大学 | High dynamic range images fusion method in compressed sensing domain based on NSCT and PCNN |
CN113221674A (en) * | 2021-04-25 | 2021-08-06 | 广东电网有限责任公司东莞供电局 | Video stream key frame extraction system and method based on rough set reduction and SIFT |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120148149A1 (en) * | 2010-12-10 | 2012-06-14 | Mrityunjay Kumar | Video key frame extraction using sparse representation |
-
2023
- 2023-05-30 CN CN202310625992.6A patent/CN116665101B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077749A (en) * | 2014-06-17 | 2014-10-01 | 长江大学 | Seismic data denoising method based on contourlet transformation |
CN106210444A (en) * | 2016-07-04 | 2016-12-07 | 石家庄铁道大学 | Kinestate self adaptation key frame extracting method |
CN109978802A (en) * | 2019-02-13 | 2019-07-05 | 中山大学 | High dynamic range images fusion method in compressed sensing domain based on NSCT and PCNN |
CN113221674A (en) * | 2021-04-25 | 2021-08-06 | 广东电网有限责任公司东莞供电局 | Video stream key frame extraction system and method based on rough set reduction and SIFT |
Non-Patent Citations (1)
Title |
---|
侧扫声纳图像非下采样轮廓波变换域分区增强方法;武鹤龙 等;兵工学报;正文第1463-1470页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116665101A (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Anomaly detection in surveillance video based on bidirectional prediction | |
Chen et al. | Saliency detection via the improved hierarchical principal component analysis method | |
CN112396027A (en) | Vehicle weight recognition method based on graph convolution neural network | |
CN116665101B (en) | Method for extracting key frames of monitoring video based on contourlet transformation | |
CN113792606B (en) | Low-cost self-supervision pedestrian re-identification model construction method based on multi-target tracking | |
CN112818849B (en) | Crowd density detection algorithm based on context attention convolutional neural network for countermeasure learning | |
CN113449660A (en) | Abnormal event detection method of space-time variation self-coding network based on self-attention enhancement | |
CN116343103B (en) | Natural resource supervision method based on three-dimensional GIS scene and video fusion | |
CN113128360A (en) | Driver driving behavior detection and identification method based on deep learning | |
Hu et al. | Parallel spatial-temporal convolutional neural networks for anomaly detection and location in crowded scenes | |
Shen et al. | An image enhancement algorithm of video surveillance scene based on deep learning | |
Wang et al. | Paccdu: pyramid attention cross-convolutional dual unet for infrared and visible image fusion | |
Li et al. | An end-to-end system for unmanned aerial vehicle high-resolution remote sensing image haze removal algorithm using convolution neural network | |
CN111626944B (en) | Video deblurring method based on space-time pyramid network and against natural priori | |
CN117218545A (en) | LBP feature and improved Yolov 5-based radar image detection method | |
CN116543333A (en) | Target recognition method, training method, device, equipment and medium of power system | |
CN112818818B (en) | Novel ultra-high-definition remote sensing image change detection method based on AFFPN | |
CN115564709A (en) | Evaluation method and system for robustness of power algorithm model in confrontation scene | |
CN116453033A (en) | Crowd density estimation method with high precision and low calculation amount in video monitoring scene | |
Zheng et al. | Scene recognition model in underground mines based on CNN-LSTM and spatial-temporal attention mechanism | |
CN112926552A (en) | Remote sensing image vehicle target recognition model and method based on deep neural network | |
CN112633142A (en) | Power transmission line violation building identification method and related device | |
CN117409206B (en) | Small sample image segmentation method based on self-adaptive prototype aggregation network | |
Wu et al. | Research and Application of Big Data Technology in Video Intelligent Analysis | |
Lv et al. | Video enhancement and super-resolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |