CN111553848B

CN111553848B - Monitoring video tracing processing method, system, storage medium and video monitoring terminal

Info

Publication number: CN111553848B
Application number: CN202010203610.7A
Authority: CN
Inventors: 沈玉龙; 胡天柱; 刘宇鹃; 赵振; 翟开放; 祝幸辉
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2023-04-07
Anticipated expiration: 2040-03-20
Also published as: CN111553848A

Abstract

The invention belongs to the technical field of video monitoring information processing, and discloses a monitoring video traceability processing method, a system, a storage medium and a video monitoring terminal, wherein the monitoring video traceability processing method is used for calculating the variance of an image to be extracted, denoising the image by using a Winenr filter, distributing the weights of RGB three-color channels to obtain the final PRNU noise, and calculating the average value of the PRNU noise as an equipment fingerprint; calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model; and constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification. The system comprises: PRNU noise extraction module, fingerprint construction module. The method provided by the invention can effectively trace the source of the video, and improves the source tracing accuracy rate compared with the traditional method.

Description

Monitoring video tracing processing method, system, storage medium and video monitoring terminal

Technical Field

The invention belongs to the technical field of video monitoring information processing, and particularly relates to a monitoring video traceability processing method, a monitoring video traceability processing system, a storage medium and a video monitoring terminal.

Background

At present, video monitoring is generally applied due to development and construction of monitoring systems such as a skynet project, a snow project and the like. Video monitoring can be carried out all-weather monitoring to each region in city, each corner on the one hand, and relevant illegal criminal behaviors are recorded, on the other hand, people can be warned to speak standardly, and a certain deterrent effect is played to people. In addition, the rapid development of the video monitoring system brings convenience to judicial certification work, and the video evidence becomes an important evidence means for solving court disputes. However, the criminal forges and falsifies the monitoring video by using various video editing software, and deceives the monitoring person, so that the criminal activity is carried out, and other people are lost, and even the social stability is affected. Moreover, the video counterfeiting can mislead the court case, influence justice and reduce the public confidence. Sensor pattern noise based methods are common in the field of forensics of video. Sensor mode noise is damaged to different degrees in a video compression process, a traditional video tracing method based on a PRNU extracts all video frames or only key frames, an undamaged effective area is ignored, and an equipment fingerprint constructed by the method is inaccurate; when a video source tracing decision is made, the accuracy of a PRNU of an actual test video frame is not considered, so that the source tracing accuracy rate cannot meet the requirement of a video monitoring system.

Through the above analysis, the problems and defects of the prior art are as follows: the traditional video tracing method based on the PRNU ignores undamaged effective areas, and the device fingerprints constructed by the method are inaccurate; when a video source tracing decision is made, the accuracy of a PRNU of an actual test video frame is not considered, so that the source tracing accuracy rate cannot meet the requirement of a video monitoring system.

The difficulty in solving the above problems and defects is: PRNU noise can suffer destruction in the video compression process, and the improper extraction of test data characteristic leads to tracing to the source rate of accuracy low.

The significance of solving the problems and the defects is as follows: the safety requirement for guaranteeing the originality of the video is higher and higher in the construction process of the video monitoring system, if the problem of video forensics is not solved, the social order is influenced, and great influence is also caused to decision of a judicial authority.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a monitoring video tracing processing method, a monitoring video tracing processing system, a storage medium and a video monitoring terminal.

The monitoring video traceability processing method is used for calculating the variance of an image to be extracted, denoising the image by using a Winenr filter, distributing the weights of RGB three-color channels to obtain final PRNU noise, and calculating the average value of the PRNU noise as an equipment fingerprint; calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model; and constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification.

Further, the video available frame of the monitoring video tracing processing method is selected by R _t (x, y) represents whether the region block represented by (x, y) of the current frame t is available for PRNU extraction, and R is zero if all DCT-AC coefficients are zero in a specific block region _t (x, y) =0, discarding the block at the time of video noise extraction; otherwise, set R _t (x, y) =1, the PRNU noise for that block is used during video noise extraction, where t denotes video frame t:

extracting PRNU, i.e. R, directly from I-frames _t (x, y) is always 1, if the frame is B, P, each block area is judged, and undamaged block areas are selected for PRNU extraction.

Further, the PRNU extraction of the surveillance video tracing processing method includes:

(1) Decomposing the image into color channels (R, G, B), performing a four-level wavelet transform on each color channel using an 8-tap Daubechies QMF to obtain four-level subbands, each level obtaining subbands in horizontal H, vertical V and diagonal D;

(2) In each subband, the local variance of the original noiseless image is estimated for each wavelet coefficient, done by using the maximum a posteriori MAP estimation performed on four sizes of the square W × W domain, W ∈ {3,4,7,9}:

where c ∈ { H, V, D }, c (i, j) is the high frequency component, σ ₀ Controlling the degree of noise suppression, σ ₀ ＝5；

(3) Four variances in four levels are compared, and the minimum is selected as the best variance estimate:

σ ² (i,j)＝min(σ ₃ ² (i,j),σ ₅ ² (i,j),σ ₇ ² (i,j),σ ₉ ² (i,j)),(i,j)∈J；

(4) Obtaining denoised wavelet coefficients using a wiener filter:

(5) Repeating the above process for each sub-band and each color channel video frame, obtaining I using inverse wavelet transform _clean ，I _clean And obtaining the noise value of each color channel of the current video frame through subtraction operation as a result after denoising:

I _noise ＝I-I _clean ；

(6) The weights for the three color channels are assigned, and the enhanced PRNU noise is obtained in combination for all channels:

further, the method for processing the source of the surveillance video extracts an original video sequence of the device in advance, obtains PRNU noise by repeating an extraction process on a series of video frames of the same video, and calculates an average value as a device fingerprint K:

wherein the content of the first and second substances,

is the noise extracted from the t-th frame, n is the number of frames processed, R _t Representing a selected region of the frame.

Further, the video feature selection of the surveillance video tracing processing method extracts PRNU noise of a current frame, performs correlation calculation with a video device fingerprint, and queries the original device attribution of the video frame; the normalized cross-correlation NCC is used for measuring the correlation degree of two groups of data, and has an NCC value of [ -1,1], if the test data has no correlation with the fingerprint data, the NCC value is-1, otherwise, if the test data and the fingerprint data are identical, the NCC value is 1, the NCC of the noise of the video frame to be detected and the equipment fingerprint is calculated, and the NCC of the frame t is defined as follows:

wherein, K represents the fingerprint of the device,

the PRNU noise, avg (K) and ≧ representing the frame estimate>

Are respectively K and->

judging by using adjacent frames, acquiring the NCC value of each frame in a window by adopting a sliding window for a video to form an NCC sequence, and performing subsequent tracing operation by taking an NCC sequence vector as a classification characteristic, wherein the NCC characteristic sequence of a frame t is defined as:

where m represents the length of the sliding window,

represents a rounding down operation;

when the NCC sequences are obtained, the characteristic information is fed into a classifier to learn the matching and non-matching NCC sequences, and a classification model is trained to carry out video source tracing and serve as a final classification result of the frame through average voting results.

Further, the SVM classification model of the surveillance video traceability processing method selects a LibSVM default kernel function-radial basis kernel function RBF as a kernel function of the classification model; selecting the optimal parameters by using a grid search mode: all (c, g) values were used for cross-validation, and the pair of (c, g) values with the highest accuracy was used as the optimal parameter.

Further, the SVM realization traceability of the surveillance video traceability processing method comprises:

step one, data processing, namely performing unified format processing on training data and test data and importing the training data and the test data;

selecting optimal parameters, and obtaining optimal parameters c and g through cross validation by using a grid search mode;

training a classification model, and training by using training data to construct a multi-classification model;

classifying, namely classifying the data to be tested by using the constructed classification model;

and fifthly, calculating the video source tracing classification accuracy according to the source tracing classification result.

It is another object of the present invention to provide a program storage medium for receiving user input, the stored computer program causing an electronic device to perform the steps comprising: performing variance calculation on an image to be extracted, denoising by using a Winener filter, distributing weights of RGB three-color channels to obtain final PRNU noise, and calculating a PRNU noise average value as an equipment fingerprint; calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model; and constructing an NCC characteristic value for the video to be detected by the same method, and performing prediction classification on the video to be detected by using the classification model to obtain a classification result so as to realize video source identification.

Another objective of the present invention is to provide a surveillance video traceability processing system for implementing the surveillance video traceability processing method, wherein the surveillance video traceability processing system comprises:

a PRNU noise extraction module 1 for extracting PRNU noise from available video frames;

the fingerprint construction module is used for constructing a unique fingerprint of the video monitoring equipment;

the video tracing detection module is used for classifying by constructing NCC related sequences;

the PRNU noise extraction module includes:

the video available frame extraction module is used for extracting available video frames;

a PRNU noise extraction module to extract PRNU noise.

Another object of the present invention is to provide a video monitoring terminal, wherein the video monitoring terminal carries the surveillance video traceability processing system of claim 9.

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides a high-efficiency PRNU fingerprint construction and a video tracing method, aiming at the problem that the video tracing accuracy is low due to the fact that video compression is not considered in the existing video fingerprint extraction method. Comprehensively considering the available areas of the I frame and the undamaged B, P frame, calculating the PRNU noise to construct the unique fingerprint of the video monitoring equipment, and constructing the NCC related sequence for classification, thereby tracing and identifying the video frame to be detected. Experiments show that the method provided by the invention can effectively trace the source of the video, and improves the source tracing accuracy rate compared with the traditional method.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a flowchart of a surveillance video source tracing processing method according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a surveillance video traceability processing system according to an embodiment of the present invention;

in the figure: 1. a PRNU noise extraction module; 1-1, a video available frame extraction module; 1-2, a PRNU noise extraction module; 2. a fingerprint construction module; 3. and a video source tracing detection module.

Fig. 3 is a flowchart of an implementation of a surveillance video source tracing processing method according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of an h.264 video coding sequence provided by an embodiment of the present invention.

FIG. 5 is a DCT coefficient diagram provided by an embodiment of the invention.

Fig. 6 is a flowchart of selecting an available video area according to an embodiment of the present invention.

Fig. 7 is a schematic diagram of a multi-classifier constructed based on a one-to-one method according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of a portion of a video sample provided by an embodiment of the present invention.

FIG. 9 is a diagram illustrating comparison of required frame numbers provided by the embodiment of the present invention.

Fig. 10 is a comparison schematic diagram of a video source tracing method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a surveillance video tracing method, system, storage medium, and video surveillance terminal, which are described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the surveillance video traceability processing method provided by the present invention includes the following steps:

s101: extracting PRNU noise from available video frames using a modified PRNU extraction and fingerprinting algorithm and generating a device fingerprint;

s102: calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model;

s103: and constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification.

As shown in fig. 2, the surveillance video traceability processing system provided by the present invention includes:

a PRNU noise extraction module 1, which extracts PRNU noise from available video frames.

And the fingerprint constructing module 2 is used for constructing a unique fingerprint of the video monitoring equipment.

And the video traceability detection module 3 is used for classifying by constructing NCC related sequences.

The PRNU noise extraction module 1 includes:

and the video available frame extraction module 1-1 is used for extracting available video frames.

A PRNU noise extraction module 1-2 to extract the PRNU noise.

The technical solution of the present invention is further described below with reference to the accompanying drawings.

As shown in fig. 3, the specific framework of the video tracing algorithm is to extract PRNU noise from the visually usable video frames by using the improved PRNU extraction and fingerprinting algorithm, generate device fingerprints, calculate the NCC-related sequence of the video frames and the device fingerprints, and feed the sequence as video features into a classifier to train a classification model. And constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification.

1. PRNU noise extraction and fingerprint construction

1.1 video available frame selection, video is essentially composed of a static image, and continuous static image playing produces a dynamic change feeling. In a video surveillance system, transmitting all still image information would consume a lot of network resources and storage space, which is not feasible in practical network conditions and operating environments. A large amount of similar data exists between adjacent frames of the video, and according to the characteristic, a video compression standard is applied to achieve the purpose of facilitating video transmission and storage.

The frequency monitoring device mainly adopts the H.264 standard for compression coding. As shown in fig. 4, in the video coding standard, video frames are classified into three categories: i frames, B frames, and P frames. Compression algorithms fall into two categories: intra-frame compression and inter-frame compression, where the I-frame is generated by the former and the B, P frame is generated by the latter. The I frame is completely coded, displays complete information of a picture and is a key frame of a video; the picture can be reconstructed by directly decoding the data of the picture. The P frame retains the difference between the current frame and the previous I frame or P frame, and when decoding, the difference data is added on the basis of the reference forward frame to obtain the complete image. Since P-frames are referenced to both forward and backward video frames, the accumulation of transmission errors can have an impact on subsequent encoding. The B frame takes the previous I frame or the next P frame as a reference to encode the prediction difference value; the complementary difference is also needed to obtain the video image during decoding. In the adjacent pictures, the complete I-frame is encoded first, and the subsequent video frames are encoded according to the difference content if the difference is not large. When the image content of a frame changes greatly compared with the previous frame, the previous image sequence is ended and the next image sequence is restarted. The above-described image sequence is referred to as a GOP group of pictures.

In the video coding process, the quantization introduces errors, so the residual block D _n After quantization, DCT transformation, inverse transformation, it is not the same as the original block. As can be seen from the combination of formula (1) and formula (2), the content of the decoding block is highly dependent on the residual block D _n The inverse DCT transform. DCT coefficient reading method as shown in fig. 5, the subblocks are typically Zig-Zag scanned to obtain coefficients. The DCT coefficients are composed of DC coefficients and AC coefficients, and the rest are all AC coefficients except that the (0,0) position is a DC coefficient. If the DCT-AC coefficients of the residual block are all 0, the high frequency content is lost and the high frequency content of the decoded block is the same as the high frequency content in the reference block. Whereas the PRNU noise is located in the high frequency part, so if the DCT-AC coefficients of its residual block are all zero, then the P of the block isRNU noise can be corrupted by video compression.

By R in the invention _t (x, y) to indicate whether the region block represented by (x, y) of the current frame t is available for PRNU extraction. R is all zero if the DCT-AC coefficients are all zero in a particular block region _t (x, y) =0, and the block is discarded at the time of video noise extraction. Otherwise, set R _t (x, y) =1, the PRNU noise of the block is used during video noise extraction. The specific representation is shown in equation (1), where t represents a video frame t.

/>

In the h.264 standard, I-frames are primary frames, decoding requires only self-data, PRNU noise is usually not corrupted, and B, P frames are corrupted to a relatively large extent. In order to obtain reliable PRNU noise while minimizing the number of required frames, the invention proposes a method for selecting available frames for video. The specific flow is shown in fig. 6. I frames are generally not corrupted, so the present invention extracts PRNUs, R, directly from I frames _t (x, y) is always 1. If the frame is B, P, each block area is judged, and undamaged block areas are selected for PRNU extraction. The accuracy of the acquired PRNU noise and the equipment fingerprint is improved through the extraction method, so that the accuracy of video source tracing is improved.

1.2PRNU extraction

Previous studies have shown that PRNU noise is an effective method of identifying the source camera. The sensor is the heart of the image acquisition process, and the PRNU is due to differences in the sensor manufacturing process, a feature that is present in all types of sensors. Thus, this feature is not the same for each camera, but can be extracted from the video frames for the same camera, which is suitable for video tracing.

The accuracy of the video tracing algorithm is closely related to the extraction of the PRNU noise fingerprint. The PRNU is mostly located in the high frequency part, and in previous studies, the PRNU noise was effectively extracted by filtering the low frequency part and extracting the high frequency part. A large number of scholars apply various noise reduction filtering methods to extract PRNU noise, wherein the wavelet-based noise reduction method is most reliable. The present invention therefore employs wavelet filtering to obtain image PRNU noise.

The PRNU noise is obtained mainly through three steps, firstly, variance calculation is carried out on an image to be extracted, secondly, denoising is carried out through a Winener filter, and finally, the weights of RGB three-color channels are distributed to obtain the final PRNU noise. The method comprises the following specific steps:

(1) The image is decomposed into color channels (R, G, B), and a four-level wavelet transform is performed on each color channel using an 8-tap Daubechies QMF to obtain four-level subbands, each level obtaining subbands in horizontal H, vertical V, and diagonal D.

(2) In each subband (c for example), the local variance of the original noiseless image is estimated for each wavelet coefficient. This is done by using a Maximum A Posteriori (MAP) estimation performed on four sizes of a square W field, W ∈ {3,4,7,9}.

Where c ∈ { H, V, D }, c (i, j) is the high frequency component, σ ₀ Controlling the degree of noise suppression, typically σ ₀ If =5, it is the best choice and reliable noise can be obtained.

(3) And comparing the four variances in the four levels, and selecting the minimum value as the optimal variance estimation.

σ ² (i,j)＝min(σ ₃ ² (i,j),σ ₅ ² (i,j),σ ₇ ² (i,j),σ ₉ ² (i,j)),(i,j)∈J (3)

(4) Using a wiener filter to obtain a denoised wavelet coefficient, wherein the formula is as follows:

(5) Repeating the above process for video frames of each sub-band and each color channel using inverse wavelet transformsIs changed into I _clean ，I _clean And obtaining the noise value of each color channel of the current video frame through subtraction operation.

I _noise ＝I-I _clean (5)

(6) The weights for the three color channels are assigned and combined for all channels to obtain enhanced PRNU noise.

In order to construct a device fingerprint for each video device, the present invention performs an extraction operation on an original video sequence of the device in advance. Sensor pattern noise is still present after averaging, while other noise and minor scene details are typically cancelled out. The invention therefore obtains the PRNU noise by repeating the above extraction process for a series of video frames of the same video, calculates its average value, and takes it as the device fingerprint K.

Wherein the content of the first and second substances,

2. Video tracing detection

2.1 video feature selection, extracting PRNU mode noise from an original video sequence, completing the work of constructing the video equipment fingerprint, extracting PRNU noise of a current frame, performing correlation calculation with the video equipment fingerprint, and inquiring the original equipment attribution of the video frame. For such problems, normalized Cross-Correlation (NCC) is the best calculation method.

Normalized cross-correlation NCC is an algorithm that measures the degree of correlation between two sets of data. The NCC value is between [ -1,1], which is-1 if the test data has no correlation with the fingerprint data, whereas it is 1 if the test data is identical. Generally, the larger the NCC value, the greater its correlation. And calculating the NCC of the video frame noise and the equipment fingerprint to be detected in order to attribute the video to be detected. NCC for frame t is defined as follows:

wherein, K represents the fingerprint of the device,

the PRNU noise, avg (K) and ≧ representing the frame estimate>

Are respectively K and->

Due to video coding, noise residues extracted from frames can be damaged to different degrees, so that the NCC value of a part of video frames is unreliable, and the value is directly used for tracing detection, so that the tracing result is inaccurate. In consideration of the accuracy of the selected features, the method utilizes adjacent frames for judgment, obtains the NCC value of each frame in a window by adopting a sliding window for the video to form an NCC sequence, and performs subsequent tracing operation by taking an NCC sequence vector as a classification feature. The NCC signature sequence for frame t is defined as:

where m represents the length of the sliding window,

indicating a rounding down operation.

After the NCC sequences are obtained, feature information is fed into a classifier to learn the matched and unmatched NCC sequences, and a classification model is trained to conduct video source tracing. When the test video is subjected to source tracing detection, the same frame is contained in a plurality of sliding windows, and a plurality of classification voting results are obtained from the frame. The invention takes the average voting result as the final classification result of the frame.

2.2SVM classification model selection, in the previous image traceability research, most scholars use a KNN classification algorithm and an SVM classifier to classify and recognize images. The KNN classification algorithm can directly process multi-classification conditions, but the training process does not exist, the test data and the training data need to be calculated for judgment in each classification of the KNN, the calculation amount is large, and the KNN classification algorithm is not suitable for a real-time video monitoring system. Therefore, the invention selects the SVM classifier to classify the video images.

The SVM classifier is a two-class classifier for classifying data based on statistics and cannot directly handle the multi-classification problem. In a video surveillance system, each video surveillance device represents a category, and video tracing is essentially a multi-classification problem. Therefore, when dealing with the traceability problem of the video surveillance system, it is necessary to construct an appropriate multi-class classifier.

The construction method of the SVM multi-classifier mainly comprises two methods: direct configurations and indirect configurations. The direct construction is realized by modifying the original function, the calculation complexity is high, and the practical application is difficult. The indirect method is largely classified into one-to-many methods (OVRSVMs) and one-to-one methods (OVOSVMs). The one-to-many method is to divide all the categories in turn, and each time, the category is taken as a positive category, and all other categories are classified as negative categories. And if the training sample class is n, constructing n class II classifiers. And classifying the test data rows by using n secondary classifiers, and selecting the class with the most statistical votes as the attribution class. The one-to-one method is to construct an SVM classifier between every two SVM classifiers, and when classifying the test data, the class with the most votes is also selected as the class, and the specific process is shown in FIG. 7. One-to-many methods require that all classifiers be trained from scratch each time a new class is added. And the one-to-one method only needs to retrain the model of the newly added sample without influencing the previously constructed classifier, and has higher relative speed. Therefore, the invention adopts a one-to-one method to construct the multi-classifier, and uses the LibSVM to carry out multi-classification.

The invention uses a LibSVM default kernel function-radial basis kernel function (RBF) as the kernel function of the classification model. The RBF kernel function has universality and is suitable for various types of samples. Compared with a polynomial kernel function, the RBF kernel function has the advantages of few required parameters, low function complexity and convenience in calculation. Two important parameters, namely a self-contained parameter g and a penalty factor c, are arranged in the RBF core, and the selection of the proper parameter is important for the classification model. The invention uses the mode of grid search to select the optimal parameters: all (c, g) values were used for cross-validation, and the pair of (c, g) values with the highest accuracy was used as the optimal parameter.

2.3SVM tracing process

The SVM classifier is divided into a training part and a testing part. The training part takes the NCC characteristics generated previously as training samples, selects the optimal parameters for training, and generates a tracing classification model for subsequent prediction classification. The testing part is used for classifying the test samples in real time by aiming at verifying the prediction accuracy and generalization capability of the classifier. The main steps of classification using LibSVM are as follows:

the method comprises the following steps: and (6) data processing. And carrying out unified format processing on the training data and the test data and importing the training data and the test data.

Step two: and selecting the optimal parameters. And (5) obtaining optimal parameters c and g by cross validation in a grid search mode.

Step three: and training a classification model. And training by using the training data to construct a multi-classification model.

Step four: and (6) classifying. Classifying data to be tested by using constructed classification model

Step five: and (5) tracing the source classification result. And calculating the video source tracing classification accuracy according to the classification result.

The classification algorithm mainly detects partial pseudo codes in the process as follows:

the technical effects of the present invention will be described in detail below with reference to the accompanying drawings.

The video data of the invention comes from a public security monitoring system in a certain city, camera data of a plurality of brands are utilized to verify the traceability algorithm provided by the invention, meanwhile, the traditional video traceability algorithm is verified, and algorithm performance comparison is carried out by analyzing results among different algorithms.

1. Experimental data to evaluate the results of the video algorithm, the present invention used a set of 15 video surveillance devices from 5 different manufacturers, with the video surveillance specific information shown in table 1. Video surveillance equipment comes from 5 brands respectively: haikangwei Shi, dahua, hanbang Gaokou, tiandi Weiye and Zhongwei. The video monitoring equipment under each brand comprises a plurality of different models: the number of Haekwove vision is 4, the number of Dahua vision is 3, the number of Hanbang Gaokou vision is 2, the number of Tiandi Wei vision is 2, and the number of Zhongwei vision is 1. All devices employ a CCD sensor as the imaging sensor in order to exploit the PRNU noise.

The present invention uses a piece VLC player to capture the real-time video stream, uses ffmpeg open source tool to extract video clips, video frames and required video attributes (rate, DCT coefficients, frame type, etc.), uses LibSVM tool for classification detection.

TABLE 1 Experimental Equipment information

As shown in fig. 8, a part of video samples adopt the h.264 compression coding standard, and a plurality of video segments are tested according to the proposed method to verify fingerprint extraction and video traceability effects.

2. According to the method, video source tracing is carried out on the video fingerprint estimation and forgery detection method based on the available area, the traditional video fingerprint estimation method based on the I frame and the video fingerprint estimation method based on all frames, and comparison is carried out on the two aspects of the video frame number and the accuracy rate respectively.

(1) Video frame number comparison

From the foregoing, it can be seen that the present invention calculates the average of the PRNU noise of a video frame as the device fingerprint K. In order to judge and obtain the number of frames required by accurate video noise, the invention sequentially extracts fingerprints of 5 sections of different videos used by each device, and extracts video frames with different lengths from the same video of each device to construct the device fingerprints. And (3) sequentially adding 10 frames from 0 to extract PRNU fingerprints, performing video tracing by using the constructed fingerprints, calculating the tracing accuracy rate, and finally taking the average value of multiple experimental results as the final frame number result.

As can be seen from fig. 9, when the tracing accuracy reaches 90%, the average required frame number of the video fingerprint estimation method based on all frames is 220 frames, the average required frame number of the video fingerprint estimation method based on I frames is only 60 frames, and the average required frame number of the method based on the available region extracted by the present invention is 58 frames. It can be seen that the difference between the required frame number of the proposed method and the required frame number of the I-frame-based method is not large, but the required frame number is reduced by 73.6% compared with the fingerprint construction method based on all frames.

(2) Comparison of accuracy

The video tracing method and the video tracing system respectively perform video tracing experiments among video monitoring devices of different brands, different models and different devices of the same model, and simultaneously test videos of different code rates of the same device so as to verify the video tracing effect. The specific experimental groups are shown in table 2 below.

Table 2 experimental grouping information

The test utilizes an SVM classifier to perform 10 tests on each type of test so as to verify the performance of the method. In each operation, the classification accuracy rate only slightly changes, and the method has stability, and finally records the average classification accuracy rate of each type of experiment. The experimental classification results are shown in table 3, and the method comparison line chart is shown in fig. 10 in order to more intuitively show the comparison effect.

TABLE 3 video monitoring device Classification accuracy comparison

As shown in fig. 10, all three video tracing algorithms based on the PRNU can accurately identify the brand and model of the camera, and the three algorithms can obtain almost the same result. When video source tracing classification is carried out among devices with the same model, the classification accuracy of the three methods is reduced, and part of error classification is caused by the fact that the same characteristics exist among the devices with the same model. The average classification accuracy rate of the method provided by the invention in the aspect of video equipment source identification reaches 96.85%, and compared with the traditional PRNU source tracing algorithm, the accuracy rate is improved, the method has a more ideal effect especially in videos with low code rate and high video compression degree, the accuracy rate is improved by about 10%, and the video source tracing can be effectively carried out and the deception attack can be detected.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. It will be appreciated by those skilled in the art that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, for example such code provided on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware) or a data carrier such as an optical or electronic signal carrier. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A monitoring video traceability processing method is characterized in that the monitoring video traceability processing method carries out variance calculation on an image to be extracted, denoising is carried out by using a Winenr filter, weights of RGB three-color channels are distributed to obtain final PRNU noise, and the average value of the PRNU noise is calculated to be used as an equipment fingerprint; calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model; constructing an NCC characteristic value of a video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification;

the method comprises the steps that a classification model is utilized to predict and classify videos to be detected, and a LibSVM default kernel function-radial basis kernel function RBF is used as a kernel function of the classification model; selecting the optimal parameters by using a grid search mode: all (c, g) values were used for cross-validation, and the pair of (c, g) values with the highest accuracy was used as the optimal parameter.

2. The surveillance video tracing method of claim 1, wherein the available frames of the surveillance video tracing method are selected by R _t (x, y) represents whether the region block represented by (x, y) of the current frame t is available for PRNU extraction, and R is zero if all DCT-AC coefficients are zero in a specific block region _t (x, y) =0, discarding the block at the time of video noise extraction; otherwise, set R _t (x, y) =1, the PRNU noise for that block is used during video noise extraction, where t denotes video frame t:

3. The surveillance video tracing method of claim 1, wherein PRNU extraction of the surveillance video tracing method comprises:

(4) Obtaining denoised wavelet coefficients using a wiener filter:

(5) Repeating the above process for each sub-band and each color channel video frame, obtained using inverse wavelet transformI _clean ，I _clean And obtaining the noise value of each color channel of the current video frame through subtraction operation as a result after denoising:

I _noise ＝I-I _clean ；

4. the surveillance video tracing method according to claim 1, wherein the method extracts an original video sequence of a device in advance, obtains PRNU noise by repeating the extraction process for a series of video frames of the same video, calculates an average value as a device fingerprint K:

wherein, the first and the second end of the pipe are connected with each other,

5. The surveillance video tracing method according to claim 1, wherein the surveillance video tracing method comprises selecting video features, extracting PRNU noise of a current frame, performing correlation calculation with a video device fingerprint, and querying an original device attribute of the video frame; the normalized cross-correlation NCC is used for measuring the correlation degree of two groups of data, and the NCC value [ -1,1], if the test data has no correlation with the fingerprint data, the NCC value is-1, otherwise, if the test data and the fingerprint data are completely the same, the NCC value is 1, the NCC of the noise of the video frame to be detected and the device fingerprint is calculated, and the NCC of the frame t is defined as follows:

wherein, K represents the fingerprint of the device,

the PRNU noise, avg (K) and ≧ representing the frame estimate>

Are respectively K and

where m represents the length of the sliding window,

represents a rounding down operation;

6. The surveillance video traceability processing method of claim 1, wherein the SVM implementation traceability of the surveillance video traceability processing method comprises:

training a classification model, training by using training data, and constructing a multi-classification model;

7. A program storage medium for receiving user input, the stored computer program causing an electronic device to execute the surveillance video traceability processing method of any one of claims 1-6, comprising the steps of: carrying out variance calculation on an image to be extracted, denoising by using a Winener filter, distributing weights of RGB three-color channels to obtain final PRNU noise, and calculating a PRNU noise average value as an equipment fingerprint; calculating NCC related sequences of the video frames and the device fingerprints, and feeding the sequences serving as video features into a classifier to train a classification model; and constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification.

8. A surveillance video traceability processing system for implementing the surveillance video traceability processing method of any one of claims 1 to 6, wherein the surveillance video traceability processing system comprises:

and the video tracing detection module is used for classifying by constructing the NCC related sequence.

9. The surveillance video traceability processing system of claim 8, wherein the PRNU noise extraction module comprises:

and the PRNU noise extraction module is used for extracting PRNU noise.

10. A video monitoring terminal, characterized in that, the video monitoring terminal carries the surveillance video traceability processing system of claim 8.