CN116665101B - Method for extracting key frames of monitoring video based on contourlet transformation - Google Patents

Method for extracting key frames of monitoring video based on contourlet transformation Download PDF

Info

Publication number
CN116665101B
CN116665101B CN202310625992.6A CN202310625992A CN116665101B CN 116665101 B CN116665101 B CN 116665101B CN 202310625992 A CN202310625992 A CN 202310625992A CN 116665101 B CN116665101 B CN 116665101B
Authority
CN
China
Prior art keywords
key frame
monitoring video
image
information
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310625992.6A
Other languages
Chinese (zh)
Other versions
CN116665101A (en
Inventor
张云佐
张嘉煜
武存宇
张天
刘亚猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang Tiedao University
Original Assignee
Shijiazhuang Tiedao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang Tiedao University filed Critical Shijiazhuang Tiedao University
Priority to CN202310625992.6A priority Critical patent/CN116665101B/en
Publication of CN116665101A publication Critical patent/CN116665101A/en
Application granted granted Critical
Publication of CN116665101B publication Critical patent/CN116665101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/478Contour-based spectral representations or scale-space representations, e.g. by Fourier analysis, wavelet analysis or curvature scale-space [CSS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method for extracting a monitoring video key frame based on contourlet transformation; the method comprises the following steps: carrying out multi-scale and multi-directional decomposition on the video sequence by utilizing the profile wave transformation to obtain an image containing rich direction information and profile information; providing a non-downsampling direction filter combination, filtering and decomposing different direction features, and carrying out inverse contour wave transformation fusion on the decomposed features under different scales to obtain a non-photosensitive contour feature map; constructing a texture enhancement model by using a nonlinear enhancement function, wherein the texture enhancement model is used for enhancing image edge information and enhancing the contrast of a target contour; the method solves the problems that the current method is easily influenced by external illumination condition change and the target direction detail information is not extracted in place, and effectively improves the precision of the key frame extraction method facing the monitoring video field.

Description

Method for extracting key frames of monitoring video based on contourlet transformation
Technical Field
The invention relates to a method for extracting a monitoring video key frame based on contourlet transformation, and belongs to the technical field of computer vision.
Background
In recent years, networks have become popular in daily life, and with the continuous development of internet media technology, network information transmission costs have been reduced. Compared with the traditional communication modes such as short messages, characters and the like, people tend to communicate in a more visual mode such as video, and the continuous development of video technology enriches the communication modes of people.
While the development of emerging technologies is continually bringing convenience to people, video data inevitably presents a trend toward blowout growth. In the field of monitoring video, the manufacturing cost of hardware equipment such as a camera is gradually reduced, so that the video can be deployed at all corners of a city, and meanwhile, a large amount of video data which cannot be processed in time is inevitably generated due to the uninterrupted working mode of the video. Intelligent monitoring video systems are involved in hunting in many areas of people's daily lives.
In the intelligent office field, the monitoring video system has strong practicability in the aspects of production regulation and control, large-scale public facility safety guarantee, remote education and the like. From the viewpoint of urban construction, a powerful technical support is provided for government to promote construction informationized public security protection engineering. In the field of traffic management, the video monitoring system plays an irreplaceable role, and a monitoring camera installed at a traffic junction such as an intersection, a high-speed toll gate and the like can capture the illegal behaviors of motor vehicles, so that traffic accidents are prevented to a certain extent. The monitoring system in the intelligent home field fully plays the self value, and the outdoor monitoring equipment can be used for detecting whether strangers exist at the gate, so that the house safety problem of people is guaranteed; the indoor monitoring equipment can dynamically know and master the condition of the home in real time, and a good living environment is provided for people.
The intelligent monitoring system plays a very important role in various fields, however, the existing monitoring system has the problems that massive data cannot be completely stored, the process of retrieving key information is complex, the labor cost is too high and the like. Key frames are typically formed as a collection of frames of an image or video, and are a collection of images that represent the most significant content in a video. Key frame extraction is a technique that studies how to reflect the primary content of the original video to the greatest extent using the smallest sequence of still images. As a technology capable of reducing redundancy of original video data and selecting core contents therefrom to be recombined into a visual abstract, key frame extraction has attracted extensive attention from domestic and foreign students. Therefore, the technology of extracting static abstracts from video data with complex contents and huge data volume and concentrating the extracted static abstracts into one frame or a plurality of video frame sequence sets capable of completely expressing video contents becomes a current research hot spot. The realization of the technology is beneficial to reducing the storage pressure of massive videos, reducing the workload of video abstract retrieval, and further saving the labor and financial cost.
Although the video retrieval technology is continuously updated and iterated, people can not efficiently and accurately acquire valuable video information in the face of massive video data. The capture characteristics of surveillance video, which typically consist of long, undipped image sequences, result in a significant amount of information redundancy in the video, which often consists of blank content in the original video where no objects appear. Meanwhile, the existing key frame extraction technology of the monitoring video still has a certain limitation, and when the illumination condition changes, the sudden change of the brightness of the monitoring video can cause the reduction of the key frame extraction efficiency. In addition, the existing key frame extraction method also has the problem that the target direction detail information in the monitoring video is not extracted in place. In order to solve the problem, a monitoring video key frame extraction method based on contourlet transformation is provided. Firstly, according to the time-frequency localization analysis capability of the profile wave transformation, carrying out multi-scale and multi-directional decomposition on a video sequence by adopting the profile wave transformation, thereby obtaining an image containing rich directions and profile information; then, a non-downsampling direction filter combination is provided, and a non-photosensitive profile characteristic diagram is obtained by filtering and fusing different direction characteristics; then, constructing a texture enhancement model by using a nonlinear enhancement function, and achieving the purpose of improving the precision of a key frame extraction method from the viewpoint of enhancing the image edge information capability; and finally, constructing a key frame screening model based on structural similarity to select and extract key frames.
Disclosure of Invention
A key frame extraction method based on contourlet transformation for the field of surveillance videos is characterized by at least comprising the following steps:
s1: collecting a monitoring video from an intelligent monitoring video system, and obtaining a video data set;
s2: cutting the monitoring video, performing graying pretreatment and the like to obtain a monitoring video sequence, and adjusting the monitoring video sequence to a preset resolution;
s3: carrying out multi-scale multi-direction decomposition on the processed monitoring video sequence so as to obtain an image containing rich directions and contour information;
s4: selecting an image high-frequency component containing image edge part information to perform non-downsampling direction filter combination decomposition so as to obtain different band-pass direction sub-bands, and performing contour wave reconstruction fusion on the band-pass direction sub-bands under different scales so as to obtain a non-photosensitive contour feature map;
s5: the texture enhancement model is used for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image;
s6: calculating the structural similarity of two adjacent frames and forming a structural similarity curve to replace a real lens dynamic curve, and replacing inflection points in the lens with minimum value points of the lens similarity curve;
s7: all minima in the structural similarity curve are put into the set and named keyframe set.
Further, preprocessing the monitoring video and converting the multi-scale multi-directional profile wave, including:
the contourlet transform is a discrete transform method specific to two-dimensional images that can efficiently process graphic geometries that contain contour information. One of the biggest features of the contourlet, which is different from other transformation methods, is the flexible multi-resolution and the different directionality, which means that it allows different directions within different scales. A non-downsampled contourlet transform is a contourlet transform that has the property of being translationally invariant, typically consisting of a non-downsampled pyramid filter and a non-downsampled direction filter.
The proposed algorithm first performs a non-downsampling pyramid decomposition on the surveillance video frame sequence, thereby obtaining high frequency components and low frequency components of the image. Because the high-frequency component of the image often contains the information of strong image change, the invention selects the high-frequency component containing the information of the image edge part to carry out the combination decomposition of the non-downsampling direction filter so as to obtain different bandpass direction sub-bands.
Next, the non-downsampled pyramid decomposition is performed again on the low frequency component containing the image gray information, and the band pass directional subbands are obtained by performing the above-described repetitive operation on the high frequency component therein.
And then, carrying out contour wave reconstruction fusion on the sub-bands in the band-pass directions under different scales, thereby obtaining a non-photosensitive contour feature map.
Instead of the non-downsampling direction filter in the conventional method, a non-downsampling direction filter combination is used, which combines a quadrature filter, a smooth quadrature filter and a two-dimensional filter based on the McClellan transform.
Further, a texture enhancement model is designed for contour texture enhancement of a non-photosensitive contour feature map, comprising:
because the non-downsampling direction filter combination is used for filtering and acquiring information aiming at high-frequency components of an image, the extracted non-photosensitive contour feature map has the problems of poor image quality and unobvious gray contrast.
Therefore, the invention designs a texture enhancement model for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image. The invention adopts a nonlinear stretching mode to improve the gray contrast of the non-photosensitive profile characteristic diagram.
And carrying out coefficient enhancement on the extracted high-frequency component according to a nonlinear enhancement function so as to achieve the purpose of increasing the contrast of contour texture information in an image, wherein the nonlinear enhancement function is shown as follows:
wherein x represents the gray value of the photosensitive contour feature map before texture enhancement construction, gamma is the final gray value result, r is the parameter for slope control of the nonlinear enhancement function, and alpha has a certain influence on the change of the slope.
According to the formula, if the slope is too high, the contrast of the gray information of the image is too high, and the image is distorted. Otherwise, the problem that the image light and dark contrast is not high, and the enhancement effect of the texture enhancement model is not obvious is caused. In view of the above, the parameter is defined as α=r=0.5 in the present invention.
Further, a key frame screening criterion based on structural similarity is designed, which comprises the following steps:
since motion is a nonlinear process and the dimensions of the image data are high, the time complexity of computing the high dynamic curve is high. Therefore, a linearization description of the target motion process is required to achieve fast and accurate key frame extraction.
It is noted that there are various methods for linearizing the description to reduce the time complexity of the high dynamic curve, and the proposed method uses the structural similarity curve to replace the real lens dynamic curve and uses the minimum point of the lens similarity curve to replace the inflection point in the lens.
In general, the overall structure information of an image is characterized by structure information, and in addition, gray information and brightness information of the image have a great influence on the observation of the image. Such information is often of interest when viewing images.
Assuming that a set of video sequences is S { s=1, 2 …, n }, where x and y represent two adjacent video frames, the brightness, contrast and structural similarity are defined as follows:
wherein mu xy Respectively the average value of the images x and y, reflecting the brightness information thereof; sigma (sigma) xy The variances of the images x and y reflect the contrast information; sigma (sigma) xy The correlation coefficients of x and y reflect the similarity of the structural information; c (C) 1 ,C 2 ,C 3 The normal number is close to zero, and abnormal results caused by zero denominator are prevented. The structural similarity of x and y obtained by combining these three information is shown below:
SSIM(x,y)=[l(x,y)] α ·[c(x,y)] β ·[s(x,y)] γ
wherein, α, β, γ are all greater than 0, and are used to adjust the relative importance of the three information, and when α=β=γ, the method can be simplified as:
however, due to the simplification, the minimum value point screened is not completely equivalent to the inflection point of the motion track of the target, so that iteration and optimization are required. The procedure is as follows.
Step 1: and extracting a complete video frame sequence of the monitoring video. The method comprises the steps of firstly cutting an input monitoring video sequence, so as to obtain a complete monitoring video frame sequence.
Step 2: and calculating the structural similarity of two adjacent frames. Structural similarity of two adjacent frames in the set of video sequences S is calculated according to equation (5-6) and stored as defined below as S SSIM Of the sets, set S SSIM The definition is as follows:
(2,SSIM(1,2))∪(3,SSIM(2,3))∪...∪(n,SSIM(n-1,n))
step 3: forming a structural similarity curve. The structural similarity curve (Structural Similarity Curve, SSIMC) is based on the set S formed by the structural similarity of two adjacent frames in step 2 SSIM Drawing toAnd (3) forming the finished product.
Step 4: a final keyframe set is formed. This method places all minima in the SSIMC into set Z and names them as candidate keyframe sets. Since the number of points K of the minimum value of the curve does not necessarily meet the required key frame number K 0 The key frame secondary extraction is performed according to the following rule.
(1) If K<K 0 All key frames in the set Z are first extracted as final key frames and stored in the final key frame set F. K is then added using a linear interpolation method 0 K frames and store them into the set F.
(2) If K=K 0 And directly extracting the key frames in the candidate key frame set Z to serve as final key frames, and storing the final key frames in the final key frame set F.
(3) If K>K 0 The peak signal-to-noise ratio (Peak Signal Noise Ratio, PSNR) of adjacent frames in the set of video sequences S is first calculated. Next, an average peak signal-to-noise ratio PSNR of two adjacent frames is calculated avg And comparing with key frames in candidate key frame set Z if the corresponding frame difference peak signal-to-noise ratio is greater than PSNR avg It is deleted from the candidate keyframe set and finally the extraction of the K-frame keyframes is completed. PSNR (Power System noise ratio) avg The calculation formula of (2) is shown as follows:
drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a frame flow chart of a method for extracting key frames of a surveillance video based on contourlet transformation;
FIG. 2 is a schematic diagram of a structure of a contourlet transformation model according to the present invention;
FIG. 3 is a schematic diagram of a texture enhancement model according to the present invention;
FIG. 4 is a schematic diagram of a video keyframe screening model based on structural similarity according to the present invention;
FIG. 5 is a diagram of different filter characteristics;
FIG. 6 is a diagram showing the effectiveness test effect of the present invention in FIG. 1;
fig. 7 shows the effectiveness test effect of the present invention in fig. 2.
Detailed Description
The technical solutions of the embodiments of the present invention will be further described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a frame flow chart of a method for extracting key frames of surveillance video based on contourlet transformation according to a first embodiment of the present invention includes:
s1: collecting a monitoring video from an intelligent monitoring video system, and obtaining a video data set;
s2: cutting the monitoring video, performing graying pretreatment and the like to obtain a monitoring video sequence, and adjusting the monitoring video sequence to a preset resolution;
s3: carrying out multi-scale multi-direction decomposition on the processed monitoring video sequence so as to obtain an image containing rich directions and contour information;
s4: selecting an image high-frequency component containing image edge part information to perform non-downsampling direction filter combination decomposition so as to obtain different band-pass direction sub-bands, and performing contour wave reconstruction fusion on the band-pass direction sub-bands under different scales so as to obtain a non-photosensitive contour feature map;
the algorithm provided by the invention firstly carries out non-downsampling pyramid decomposition on the monitoring video frame sequence, thereby obtaining the high-frequency component and the low-frequency component of the image.
Because the high-frequency component of the image often contains the information of strong image change, the invention selects the high-frequency component containing the information of the image edge part to carry out the combination decomposition of the non-downsampling direction filter so as to obtain different bandpass direction sub-bands.
Repeating the operation again for the low frequency component containing the image gray information, and adopting the repeated operation for the high frequency component to obtain the band-pass direction sub-band.
And then, carrying out contour wave reconstruction fusion on the sub-bands in the band-pass directions under different scales, thereby obtaining a non-photosensitive contour feature map.
The present invention uses a combination of non-downsampling direction filters in place of the non-downsampling direction filters in the conventional approach, which combines a quadrature filter, a smooth quadrature filter, and a two-dimensional filter based on the McClellan transform. Fig. 4 is a feature map of an original image after filtering based on different filters.
S5: the texture enhancement model is used for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image;
therefore, the invention designs a texture enhancement model for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image. The invention adopts a nonlinear stretching mode to improve the gray contrast of the non-photosensitive profile characteristic diagram.
And carrying out coefficient enhancement on the extracted high-frequency component according to a nonlinear enhancement function so as to achieve the purpose of increasing the contrast of contour texture information in an image, wherein the nonlinear enhancement function is shown as follows:
wherein x represents the gray value of the photosensitive contour feature map before texture enhancement construction, gamma is the final gray value result, r is the parameter for slope control of the nonlinear enhancement function, and alpha has a certain influence on the change of the slope.
According to the formula, if the slope is too high, the contrast of the gray information of the image is too high, and the image is distorted. Otherwise, the problem that the image light and dark contrast is not high, and the enhancement effect of the texture enhancement model is not obvious is caused. In view of the above, the parameter is defined as α=r=0.5 in the present invention.
S6: calculating the structural similarity of two adjacent frames and forming a structural similarity curve to replace a real lens dynamic curve, and replacing inflection points in the lens with minimum value points of the lens similarity curve;
s7: all minima in the structural similarity curve are put into the set and named keyframe set.
In general, the overall structure information of an image is characterized by structure information, and in addition, gray information and brightness information of the image have a great influence on the observation of the image. Such information is often of interest when viewing images.
Assuming that a set of video sequences is S { s=1, 2 …, n }, where x and y represent two adjacent video frames, the brightness, contrast and structural similarity are defined as follows:
wherein mu xy Respectively the average value of the images x and y, reflecting the brightness information thereof; sigma (sigma) xy The variances of the images x and y reflect the contrast information; sigma (sigma) xy The correlation coefficients of x and y reflect the similarity of the structural information; c (C) 1 ,C 2 ,C 3 Is a normal number close to zero, prevents the denominator from being zeroWhich leads to abnormal results. The structural similarity of x and y obtained by combining these three information is shown below:
SSIM(x,y)=[l(x,y)] α ·[c(x,y)] β ·[s(x,y)] γ
wherein, α, β, γ are all greater than 0, and are used to adjust the relative importance of the three information, and when α=β=γ, the method can be simplified as:
however, due to the simplification, the minimum value point screened is not completely equivalent to the inflection point of the motion track of the target, so that iteration and optimization are required. The procedure is as follows.
Step 1: and extracting a complete video frame sequence of the monitoring video. The method comprises the steps of firstly cutting an input monitoring video sequence, so as to obtain a complete monitoring video frame sequence.
Step 2: and calculating the structural similarity of two adjacent frames. Structural similarity of two adjacent frames in the set of video sequences S is calculated according to equation (5-6) and stored as defined below as S SSIM Of the sets, set S SSIM The definition is as follows:
(2,SSIM(1,2))∪(3,SSIM(2,3))∪...∪(n,SSIM(n-1,n))
step 3: forming a structural similarity curve. The structural similarity curve (Structural Similarity Curve, SSIMC) is based on the set S formed by the structural similarity of two adjacent frames in step 2 SSIM Drawing to obtain the final product.
Step 4: a final keyframe set is formed. This method places all minima in the SSIMC into set Z and names them as candidate keyframe sets. Since the number of points K of the minimum value of the curve does not necessarily meet the required key frame number K 0 The key frame secondary extraction is performed according to the following rule.
(1) If K<K 0 All key frames in the set Z are first extracted as final key frames and stored in the final key frame set F. K is then added using a linear interpolation method 0 K frames and store them into the set F.
(2) If K=K 0 And directly extracting the key frames in the candidate key frame set Z to serve as final key frames, and storing the final key frames in the final key frame set F.
(3) If K>K 0 The peak signal-to-noise ratio (Peak Signal Noise Ratio, PSNR) of adjacent frames in the set of video sequences S is first calculated. Next, an average peak signal-to-noise ratio PSNR of two adjacent frames is calculated avg And comparing with key frames in candidate key frame set Z if the corresponding frame difference peak signal-to-noise ratio is greater than PSNR avg It is deleted from the candidate keyframe set and finally the extraction of the K-frame keyframes is completed. PSNR (Power System noise ratio) avg The calculation formula of (2) is shown in the following formula (5-8):
an embodiment of the invention provides a monitoring video key frame extraction method terminal device based on contourlet transformation, which comprises one or more input devices, one or more output devices, one or more processors and a memory, wherein the memory is used for storing a computer program, and the processor is used for executing the computer program to realize the target detection method facing an unmanned aerial vehicle platform.
An embodiment of the present invention provides a computer readable storage medium storing a computer program, where the computer program when executed by a processor performs the above-mentioned method for extracting keyframes of surveillance video based on contourlet transformation.
To verify the validity of the above embodiment, we compare the present invention with the advanced approach in the key frame extraction approach by calculating the accuracy, recall, and F1 score. Specifically, we use the data set Visor containing rich local motion information and the real world data set abnormal situation data set UCF-Crime. The Visor dataset contains various human activities such as walking, jumping, placing objects and the like, and the human activities are used for judging the extraction capability of the method provided by the chapter to the target detail information. The data set UCF-Crime covers a total of seven real world anomalies. The above-mentioned abnormal behaviors are chosen because they affect the normal life of the real world and thus require accurate key frame extraction.
The present invention will use a tone scale map to represent the precision, recall, and F1 score of each video in both datasets. From table 1 it can be seen that the recall of the proposed method is significantly better than the comparative method, with significant advantages. And meanwhile, the F1 fraction is improved relative to a comparison method. It can be seen from table 2 that the advantages of the proposed method still manifest themselves in the improvement of recall. Meanwhile, the manual destruction process is described in Video17 and Video18, and it can be seen that the method provided by the invention is significantly better than the comparison method in terms of accuracy, recall and F1 fraction. The fire disaster is often accompanied with sudden change of brightness in the monitoring video lens, however, the key frame extraction method for comparison is easily affected by factors such as illumination condition change, and key frames with sudden change of brightness cannot be accurately captured, so that the extraction accuracy is reduced. However, the key frame extraction technology based on the method provided by the invention generates a non-photosensitive outline feature map through fusion of multiple filters, so that the video with abrupt brightness change of the lens in the face of fire and the like has higher accuracy. In summary, the accuracy and recall rate of the key frames extracted by the method of the present invention are better than those of the comparison method, which further proves the effectiveness of the method of the present invention. Meanwhile, the generalization of the method provided by the invention is also verified.
Table 1 comparison of the results of the different methods of data set 1
Table 2 comparison of results for the different methods of dataset 2

Claims (2)

1. The key frame extraction method for the monitoring video field is characterized by at least comprising the following steps:
s1: collecting a monitoring video from an intelligent monitoring video system, and obtaining a video data set;
s2: cutting the monitoring video, performing graying pretreatment and the like to obtain a monitoring video sequence, and adjusting the monitoring video sequence to a preset resolution;
s3: carrying out multi-scale multi-direction decomposition on the processed monitoring video sequence by utilizing the profile wave transformation, thereby obtaining an image containing rich directions and profile information; the specific operation is as follows: selecting an image high-frequency component containing image edge part information, decomposing the image high-frequency component by using a non-downsampling direction filter combination to replace a traditional non-downsampling direction filter, so as to obtain different band-pass direction sub-bands, and carrying out contour wave reconstruction fusion on the band-pass direction sub-bands under different scales, so as to obtain a non-photosensitive contour feature map; the non-downsampling direction filter combination comprises a quadrature filter, a smooth quadrature filter and a two-dimensional filter based on Maclen transformation;
s4: the texture enhancement model is used for enhancing the contour texture of the non-photosensitive contour feature map so as to achieve the purposes of improving the image quality and enhancing the contrast of gray information in the image;
s5: calculating the structural similarity of two adjacent frames and forming a structural similarity curve to replace a real lens dynamic curve, replacing inflection points in the lens with minimum value points of the structural similarity curve, putting the inflection points into a set Z, and naming the inflection points as a key frame set; the method aims at carrying out linear description on the target motion trail so as to reduce the time complexity of a high-dynamic curve;
s6: performing secondary key frame extraction on the key frame set to ensure the number of curve minimum point numbers K and the required key frame number K 0 The specific steps are as follows: if K<K 0 All key frames in the set Z are firstly extracted as final key frames and stored in the final key frame setIn F, K is then added using a linear interpolation method 0 -K frames and store them into the set F; if K=K 0 Directly extracting the key frames in the candidate key frame set Z to be used as final key frames, and storing the final key frames in a final key frame set F; if K > K 0 Then, the peak signal-to-noise ratio PSNR of the adjacent frames in the video sequence set S is calculated first, and then, the average peak signal-to-noise ratio PSNR of the two adjacent frames is calculated avg And comparing with key frames in candidate key frame set Z if the corresponding frame difference peak signal-to-noise ratio is greater than PSNR avg It is deleted from the candidate keyframe set and finally the extraction of the K-frame keyframes is completed.
2. The texture enhancement model of claim 1, wherein the non-linear stretching is used to improve the gray contrast of the non-photosensitive profile; and carrying out coefficient enhancement on the extracted high-frequency component according to a nonlinear enhancement function so as to achieve the purpose of increasing the contrast of contour texture information in an image, wherein the nonlinear enhancement function is shown as follows:
wherein x represents the gray value of the photosensitive contour feature map before texture enhancement construction, gamma is the final gray value result, r is the parameter for slope control of the nonlinear enhancement function, and alpha has a certain influence on the change of the slope.
CN202310625992.6A 2023-05-30 2023-05-30 Method for extracting key frames of monitoring video based on contourlet transformation Active CN116665101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310625992.6A CN116665101B (en) 2023-05-30 2023-05-30 Method for extracting key frames of monitoring video based on contourlet transformation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310625992.6A CN116665101B (en) 2023-05-30 2023-05-30 Method for extracting key frames of monitoring video based on contourlet transformation

Publications (2)

Publication Number Publication Date
CN116665101A CN116665101A (en) 2023-08-29
CN116665101B true CN116665101B (en) 2024-01-26

Family

ID=87716588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310625992.6A Active CN116665101B (en) 2023-05-30 2023-05-30 Method for extracting key frames of monitoring video based on contourlet transformation

Country Status (1)

Country Link
CN (1) CN116665101B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117934574A (en) * 2024-03-22 2024-04-26 深圳市智兴盛电子有限公司 Method, device, equipment and storage medium for optimizing image of automobile data recorder

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077749A (en) * 2014-06-17 2014-10-01 长江大学 Seismic data denoising method based on contourlet transformation
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN109978802A (en) * 2019-02-13 2019-07-05 中山大学 High dynamic range images fusion method in compressed sensing domain based on NSCT and PCNN
CN113221674A (en) * 2021-04-25 2021-08-06 广东电网有限责任公司东莞供电局 Video stream key frame extraction system and method based on rough set reduction and SIFT

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120148149A1 (en) * 2010-12-10 2012-06-14 Mrityunjay Kumar Video key frame extraction using sparse representation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077749A (en) * 2014-06-17 2014-10-01 长江大学 Seismic data denoising method based on contourlet transformation
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN109978802A (en) * 2019-02-13 2019-07-05 中山大学 High dynamic range images fusion method in compressed sensing domain based on NSCT and PCNN
CN113221674A (en) * 2021-04-25 2021-08-06 广东电网有限责任公司东莞供电局 Video stream key frame extraction system and method based on rough set reduction and SIFT

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
侧扫声纳图像非下采样轮廓波变换域分区增强方法;武鹤龙 等;兵工学报;正文第1463-1470页 *

Also Published As

Publication number Publication date
CN116665101A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
Chen et al. Anomaly detection in surveillance video based on bidirectional prediction
Chen et al. Saliency detection via the improved hierarchical principal component analysis method
CN112396027A (en) Vehicle weight recognition method based on graph convolution neural network
CN116665101B (en) Method for extracting key frames of monitoring video based on contourlet transformation
CN113792606B (en) Low-cost self-supervision pedestrian re-identification model construction method based on multi-target tracking
CN112818849B (en) Crowd density detection algorithm based on context attention convolutional neural network for countermeasure learning
CN113449660A (en) Abnormal event detection method of space-time variation self-coding network based on self-attention enhancement
CN116343103B (en) Natural resource supervision method based on three-dimensional GIS scene and video fusion
CN113128360A (en) Driver driving behavior detection and identification method based on deep learning
Hu et al. Parallel spatial-temporal convolutional neural networks for anomaly detection and location in crowded scenes
Shen et al. An image enhancement algorithm of video surveillance scene based on deep learning
Wang et al. Paccdu: pyramid attention cross-convolutional dual unet for infrared and visible image fusion
Li et al. An end-to-end system for unmanned aerial vehicle high-resolution remote sensing image haze removal algorithm using convolution neural network
CN111626944B (en) Video deblurring method based on space-time pyramid network and against natural priori
CN117218545A (en) LBP feature and improved Yolov 5-based radar image detection method
CN116543333A (en) Target recognition method, training method, device, equipment and medium of power system
CN112818818B (en) Novel ultra-high-definition remote sensing image change detection method based on AFFPN
CN115564709A (en) Evaluation method and system for robustness of power algorithm model in confrontation scene
CN116453033A (en) Crowd density estimation method with high precision and low calculation amount in video monitoring scene
Zheng et al. Scene recognition model in underground mines based on CNN-LSTM and spatial-temporal attention mechanism
CN112926552A (en) Remote sensing image vehicle target recognition model and method based on deep neural network
CN112633142A (en) Power transmission line violation building identification method and related device
CN117409206B (en) Small sample image segmentation method based on self-adaptive prototype aggregation network
Wu et al. Research and Application of Big Data Technology in Video Intelligent Analysis
Lv et al. Video enhancement and super-resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant