CN111553848B - Monitoring video tracing processing method, system, storage medium and video monitoring terminal - Google Patents

Monitoring video tracing processing method, system, storage medium and video monitoring terminal Download PDF

Info

Publication number
CN111553848B
CN111553848B CN202010203610.7A CN202010203610A CN111553848B CN 111553848 B CN111553848 B CN 111553848B CN 202010203610 A CN202010203610 A CN 202010203610A CN 111553848 B CN111553848 B CN 111553848B
Authority
CN
China
Prior art keywords
video
ncc
noise
frame
prnu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010203610.7A
Other languages
Chinese (zh)
Other versions
CN111553848A (en
Inventor
沈玉龙
胡天柱
刘宇鹃
赵振
翟开放
祝幸辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010203610.7A priority Critical patent/CN111553848B/en
Publication of CN111553848A publication Critical patent/CN111553848A/en
Application granted granted Critical
Publication of CN111553848B publication Critical patent/CN111553848B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/71Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
    • G06F21/73Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information by creating or determining hardware identification, e.g. serial numbers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Abstract

The invention belongs to the technical field of video monitoring information processing, and discloses a monitoring video traceability processing method, a system, a storage medium and a video monitoring terminal, wherein the monitoring video traceability processing method is used for calculating the variance of an image to be extracted, denoising the image by using a Winenr filter, distributing the weights of RGB three-color channels to obtain the final PRNU noise, and calculating the average value of the PRNU noise as an equipment fingerprint; calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model; and constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification. The system comprises: PRNU noise extraction module, fingerprint construction module. The method provided by the invention can effectively trace the source of the video, and improves the source tracing accuracy rate compared with the traditional method.

Description

Monitoring video tracing processing method, system, storage medium and video monitoring terminal
Technical Field
The invention belongs to the technical field of video monitoring information processing, and particularly relates to a monitoring video traceability processing method, a monitoring video traceability processing system, a storage medium and a video monitoring terminal.
Background
At present, video monitoring is generally applied due to development and construction of monitoring systems such as a skynet project, a snow project and the like. Video monitoring can be carried out all-weather monitoring to each region in city, each corner on the one hand, and relevant illegal criminal behaviors are recorded, on the other hand, people can be warned to speak standardly, and a certain deterrent effect is played to people. In addition, the rapid development of the video monitoring system brings convenience to judicial certification work, and the video evidence becomes an important evidence means for solving court disputes. However, the criminal forges and falsifies the monitoring video by using various video editing software, and deceives the monitoring person, so that the criminal activity is carried out, and other people are lost, and even the social stability is affected. Moreover, the video counterfeiting can mislead the court case, influence justice and reduce the public confidence. Sensor pattern noise based methods are common in the field of forensics of video. Sensor mode noise is damaged to different degrees in a video compression process, a traditional video tracing method based on a PRNU extracts all video frames or only key frames, an undamaged effective area is ignored, and an equipment fingerprint constructed by the method is inaccurate; when a video source tracing decision is made, the accuracy of a PRNU of an actual test video frame is not considered, so that the source tracing accuracy rate cannot meet the requirement of a video monitoring system.
Through the above analysis, the problems and defects of the prior art are as follows: the traditional video tracing method based on the PRNU ignores undamaged effective areas, and the device fingerprints constructed by the method are inaccurate; when a video source tracing decision is made, the accuracy of a PRNU of an actual test video frame is not considered, so that the source tracing accuracy rate cannot meet the requirement of a video monitoring system.
The difficulty in solving the above problems and defects is: PRNU noise can suffer destruction in the video compression process, and the improper extraction of test data characteristic leads to tracing to the source rate of accuracy low.
The significance of solving the problems and the defects is as follows: the safety requirement for guaranteeing the originality of the video is higher and higher in the construction process of the video monitoring system, if the problem of video forensics is not solved, the social order is influenced, and great influence is also caused to decision of a judicial authority.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a monitoring video tracing processing method, a monitoring video tracing processing system, a storage medium and a video monitoring terminal.
The monitoring video traceability processing method is used for calculating the variance of an image to be extracted, denoising the image by using a Winenr filter, distributing the weights of RGB three-color channels to obtain final PRNU noise, and calculating the average value of the PRNU noise as an equipment fingerprint; calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model; and constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification.
Further, the video available frame of the monitoring video tracing processing method is selected by R t (x, y) represents whether the region block represented by (x, y) of the current frame t is available for PRNU extraction, and R is zero if all DCT-AC coefficients are zero in a specific block region t (x, y) =0, discarding the block at the time of video noise extraction; otherwise, set R t (x, y) =1, the PRNU noise for that block is used during video noise extraction, where t denotes video frame t:
Figure BDA0002420195700000021
extracting PRNU, i.e. R, directly from I-frames t (x, y) is always 1, if the frame is B, P, each block area is judged, and undamaged block areas are selected for PRNU extraction.
Further, the PRNU extraction of the surveillance video tracing processing method includes:
(1) Decomposing the image into color channels (R, G, B), performing a four-level wavelet transform on each color channel using an 8-tap Daubechies QMF to obtain four-level subbands, each level obtaining subbands in horizontal H, vertical V and diagonal D;
(2) In each subband, the local variance of the original noiseless image is estimated for each wavelet coefficient, done by using the maximum a posteriori MAP estimation performed on four sizes of the square W × W domain, W ∈ {3,4,7,9}:
Figure BDA0002420195700000031
where c ∈ { H, V, D }, c (i, j) is the high frequency component, σ 0 Controlling the degree of noise suppression, σ 0 =5;
(3) Four variances in four levels are compared, and the minimum is selected as the best variance estimate:
σ 2 (i,j)=min(σ 3 2 (i,j),σ 5 2 (i,j),σ 7 2 (i,j),σ 9 2 (i,j)),(i,j)∈J;
(4) Obtaining denoised wavelet coefficients using a wiener filter:
Figure BDA0002420195700000032
(5) Repeating the above process for each sub-band and each color channel video frame, obtaining I using inverse wavelet transform clean ,I clean And obtaining the noise value of each color channel of the current video frame through subtraction operation as a result after denoising:
I noise =I-I clean
(6) The weights for the three color channels are assigned, and the enhanced PRNU noise is obtained in combination for all channels:
Figure BDA0002420195700000033
further, the method for processing the source of the surveillance video extracts an original video sequence of the device in advance, obtains PRNU noise by repeating an extraction process on a series of video frames of the same video, and calculates an average value as a device fingerprint K:
Figure BDA0002420195700000034
wherein the content of the first and second substances,
Figure BDA0002420195700000035
is the noise extracted from the t-th frame, n is the number of frames processed, R t Representing a selected region of the frame.
Further, the video feature selection of the surveillance video tracing processing method extracts PRNU noise of a current frame, performs correlation calculation with a video device fingerprint, and queries the original device attribution of the video frame; the normalized cross-correlation NCC is used for measuring the correlation degree of two groups of data, and has an NCC value of [ -1,1], if the test data has no correlation with the fingerprint data, the NCC value is-1, otherwise, if the test data and the fingerprint data are identical, the NCC value is 1, the NCC of the noise of the video frame to be detected and the equipment fingerprint is calculated, and the NCC of the frame t is defined as follows:
Figure BDA0002420195700000041
wherein, K represents the fingerprint of the device,
Figure BDA0002420195700000042
the PRNU noise, avg (K) and ≧ representing the frame estimate>
Figure BDA0002420195700000043
Are respectively K and->
Figure BDA0002420195700000044
The < a, b > is inner product operation, and | | · | | | represents euclidean norm;
judging by using adjacent frames, acquiring the NCC value of each frame in a window by adopting a sliding window for a video to form an NCC sequence, and performing subsequent tracing operation by taking an NCC sequence vector as a classification characteristic, wherein the NCC characteristic sequence of a frame t is defined as:
Figure BDA0002420195700000045
where m represents the length of the sliding window,
Figure BDA0002420195700000046
represents a rounding down operation;
when the NCC sequences are obtained, the characteristic information is fed into a classifier to learn the matching and non-matching NCC sequences, and a classification model is trained to carry out video source tracing and serve as a final classification result of the frame through average voting results.
Further, the SVM classification model of the surveillance video traceability processing method selects a LibSVM default kernel function-radial basis kernel function RBF as a kernel function of the classification model; selecting the optimal parameters by using a grid search mode: all (c, g) values were used for cross-validation, and the pair of (c, g) values with the highest accuracy was used as the optimal parameter.
Further, the SVM realization traceability of the surveillance video traceability processing method comprises:
step one, data processing, namely performing unified format processing on training data and test data and importing the training data and the test data;
selecting optimal parameters, and obtaining optimal parameters c and g through cross validation by using a grid search mode;
training a classification model, and training by using training data to construct a multi-classification model;
classifying, namely classifying the data to be tested by using the constructed classification model;
and fifthly, calculating the video source tracing classification accuracy according to the source tracing classification result.
It is another object of the present invention to provide a program storage medium for receiving user input, the stored computer program causing an electronic device to perform the steps comprising: performing variance calculation on an image to be extracted, denoising by using a Winener filter, distributing weights of RGB three-color channels to obtain final PRNU noise, and calculating a PRNU noise average value as an equipment fingerprint; calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model; and constructing an NCC characteristic value for the video to be detected by the same method, and performing prediction classification on the video to be detected by using the classification model to obtain a classification result so as to realize video source identification.
Another objective of the present invention is to provide a surveillance video traceability processing system for implementing the surveillance video traceability processing method, wherein the surveillance video traceability processing system comprises:
a PRNU noise extraction module 1 for extracting PRNU noise from available video frames;
the fingerprint construction module is used for constructing a unique fingerprint of the video monitoring equipment;
the video tracing detection module is used for classifying by constructing NCC related sequences;
the PRNU noise extraction module includes:
the video available frame extraction module is used for extracting available video frames;
a PRNU noise extraction module to extract PRNU noise.
Another object of the present invention is to provide a video monitoring terminal, wherein the video monitoring terminal carries the surveillance video traceability processing system of claim 9.
By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides a high-efficiency PRNU fingerprint construction and a video tracing method, aiming at the problem that the video tracing accuracy is low due to the fact that video compression is not considered in the existing video fingerprint extraction method. Comprehensively considering the available areas of the I frame and the undamaged B, P frame, calculating the PRNU noise to construct the unique fingerprint of the video monitoring equipment, and constructing the NCC related sequence for classification, thereby tracing and identifying the video frame to be detected. Experiments show that the method provided by the invention can effectively trace the source of the video, and improves the source tracing accuracy rate compared with the traditional method.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is a flowchart of a surveillance video source tracing processing method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a surveillance video traceability processing system according to an embodiment of the present invention;
in the figure: 1. a PRNU noise extraction module; 1-1, a video available frame extraction module; 1-2, a PRNU noise extraction module; 2. a fingerprint construction module; 3. and a video source tracing detection module.
Fig. 3 is a flowchart of an implementation of a surveillance video source tracing processing method according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of an h.264 video coding sequence provided by an embodiment of the present invention.
FIG. 5 is a DCT coefficient diagram provided by an embodiment of the invention.
Fig. 6 is a flowchart of selecting an available video area according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of a multi-classifier constructed based on a one-to-one method according to an embodiment of the present invention.
Fig. 8 is a schematic diagram of a portion of a video sample provided by an embodiment of the present invention.
FIG. 9 is a diagram illustrating comparison of required frame numbers provided by the embodiment of the present invention.
Fig. 10 is a comparison schematic diagram of a video source tracing method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a surveillance video tracing method, system, storage medium, and video surveillance terminal, which are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the surveillance video traceability processing method provided by the present invention includes the following steps:
s101: extracting PRNU noise from available video frames using a modified PRNU extraction and fingerprinting algorithm and generating a device fingerprint;
s102: calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model;
s103: and constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification.
As shown in fig. 2, the surveillance video traceability processing system provided by the present invention includes:
a PRNU noise extraction module 1, which extracts PRNU noise from available video frames.
And the fingerprint constructing module 2 is used for constructing a unique fingerprint of the video monitoring equipment.
And the video traceability detection module 3 is used for classifying by constructing NCC related sequences.
The PRNU noise extraction module 1 includes:
and the video available frame extraction module 1-1 is used for extracting available video frames.
A PRNU noise extraction module 1-2 to extract the PRNU noise.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
As shown in fig. 3, the specific framework of the video tracing algorithm is to extract PRNU noise from the visually usable video frames by using the improved PRNU extraction and fingerprinting algorithm, generate device fingerprints, calculate the NCC-related sequence of the video frames and the device fingerprints, and feed the sequence as video features into a classifier to train a classification model. And constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification.
1. PRNU noise extraction and fingerprint construction
1.1 video available frame selection, video is essentially composed of a static image, and continuous static image playing produces a dynamic change feeling. In a video surveillance system, transmitting all still image information would consume a lot of network resources and storage space, which is not feasible in practical network conditions and operating environments. A large amount of similar data exists between adjacent frames of the video, and according to the characteristic, a video compression standard is applied to achieve the purpose of facilitating video transmission and storage.
The frequency monitoring device mainly adopts the H.264 standard for compression coding. As shown in fig. 4, in the video coding standard, video frames are classified into three categories: i frames, B frames, and P frames. Compression algorithms fall into two categories: intra-frame compression and inter-frame compression, where the I-frame is generated by the former and the B, P frame is generated by the latter. The I frame is completely coded, displays complete information of a picture and is a key frame of a video; the picture can be reconstructed by directly decoding the data of the picture. The P frame retains the difference between the current frame and the previous I frame or P frame, and when decoding, the difference data is added on the basis of the reference forward frame to obtain the complete image. Since P-frames are referenced to both forward and backward video frames, the accumulation of transmission errors can have an impact on subsequent encoding. The B frame takes the previous I frame or the next P frame as a reference to encode the prediction difference value; the complementary difference is also needed to obtain the video image during decoding. In the adjacent pictures, the complete I-frame is encoded first, and the subsequent video frames are encoded according to the difference content if the difference is not large. When the image content of a frame changes greatly compared with the previous frame, the previous image sequence is ended and the next image sequence is restarted. The above-described image sequence is referred to as a GOP group of pictures.
In the video coding process, the quantization introduces errors, so the residual block D n After quantization, DCT transformation, inverse transformation, it is not the same as the original block. As can be seen from the combination of formula (1) and formula (2), the content of the decoding block is highly dependent on the residual block D n The inverse DCT transform. DCT coefficient reading method as shown in fig. 5, the subblocks are typically Zig-Zag scanned to obtain coefficients. The DCT coefficients are composed of DC coefficients and AC coefficients, and the rest are all AC coefficients except that the (0,0) position is a DC coefficient. If the DCT-AC coefficients of the residual block are all 0, the high frequency content is lost and the high frequency content of the decoded block is the same as the high frequency content in the reference block. Whereas the PRNU noise is located in the high frequency part, so if the DCT-AC coefficients of its residual block are all zero, then the P of the block isRNU noise can be corrupted by video compression.
By R in the invention t (x, y) to indicate whether the region block represented by (x, y) of the current frame t is available for PRNU extraction. R is all zero if the DCT-AC coefficients are all zero in a particular block region t (x, y) =0, and the block is discarded at the time of video noise extraction. Otherwise, set R t (x, y) =1, the PRNU noise of the block is used during video noise extraction. The specific representation is shown in equation (1), where t represents a video frame t.
Figure BDA0002420195700000081
/>
In the h.264 standard, I-frames are primary frames, decoding requires only self-data, PRNU noise is usually not corrupted, and B, P frames are corrupted to a relatively large extent. In order to obtain reliable PRNU noise while minimizing the number of required frames, the invention proposes a method for selecting available frames for video. The specific flow is shown in fig. 6. I frames are generally not corrupted, so the present invention extracts PRNUs, R, directly from I frames t (x, y) is always 1. If the frame is B, P, each block area is judged, and undamaged block areas are selected for PRNU extraction. The accuracy of the acquired PRNU noise and the equipment fingerprint is improved through the extraction method, so that the accuracy of video source tracing is improved.
1.2PRNU extraction
Previous studies have shown that PRNU noise is an effective method of identifying the source camera. The sensor is the heart of the image acquisition process, and the PRNU is due to differences in the sensor manufacturing process, a feature that is present in all types of sensors. Thus, this feature is not the same for each camera, but can be extracted from the video frames for the same camera, which is suitable for video tracing.
The accuracy of the video tracing algorithm is closely related to the extraction of the PRNU noise fingerprint. The PRNU is mostly located in the high frequency part, and in previous studies, the PRNU noise was effectively extracted by filtering the low frequency part and extracting the high frequency part. A large number of scholars apply various noise reduction filtering methods to extract PRNU noise, wherein the wavelet-based noise reduction method is most reliable. The present invention therefore employs wavelet filtering to obtain image PRNU noise.
The PRNU noise is obtained mainly through three steps, firstly, variance calculation is carried out on an image to be extracted, secondly, denoising is carried out through a Winener filter, and finally, the weights of RGB three-color channels are distributed to obtain the final PRNU noise. The method comprises the following specific steps:
(1) The image is decomposed into color channels (R, G, B), and a four-level wavelet transform is performed on each color channel using an 8-tap Daubechies QMF to obtain four-level subbands, each level obtaining subbands in horizontal H, vertical V, and diagonal D.
(2) In each subband (c for example), the local variance of the original noiseless image is estimated for each wavelet coefficient. This is done by using a Maximum A Posteriori (MAP) estimation performed on four sizes of a square W field, W ∈ {3,4,7,9}.
Figure BDA0002420195700000091
Where c ∈ { H, V, D }, c (i, j) is the high frequency component, σ 0 Controlling the degree of noise suppression, typically σ 0 If =5, it is the best choice and reliable noise can be obtained.
(3) And comparing the four variances in the four levels, and selecting the minimum value as the optimal variance estimation.
σ 2 (i,j)=min(σ 3 2 (i,j),σ 5 2 (i,j),σ 7 2 (i,j),σ 9 2 (i,j)),(i,j)∈J (3)
(4) Using a wiener filter to obtain a denoised wavelet coefficient, wherein the formula is as follows:
Figure BDA0002420195700000101
(5) Repeating the above process for video frames of each sub-band and each color channel using inverse wavelet transformsIs changed into I clean ,I clean And obtaining the noise value of each color channel of the current video frame through subtraction operation.
I noise =I-I clean (5)
(6) The weights for the three color channels are assigned and combined for all channels to obtain enhanced PRNU noise.
Figure BDA0002420195700000102
In order to construct a device fingerprint for each video device, the present invention performs an extraction operation on an original video sequence of the device in advance. Sensor pattern noise is still present after averaging, while other noise and minor scene details are typically cancelled out. The invention therefore obtains the PRNU noise by repeating the above extraction process for a series of video frames of the same video, calculates its average value, and takes it as the device fingerprint K.
Figure BDA0002420195700000103
Wherein the content of the first and second substances,
Figure BDA0002420195700000104
is the noise extracted from the t-th frame, n is the number of frames processed, R t Representing a selected region of the frame.
2. Video tracing detection
2.1 video feature selection, extracting PRNU mode noise from an original video sequence, completing the work of constructing the video equipment fingerprint, extracting PRNU noise of a current frame, performing correlation calculation with the video equipment fingerprint, and inquiring the original equipment attribution of the video frame. For such problems, normalized Cross-Correlation (NCC) is the best calculation method.
Normalized cross-correlation NCC is an algorithm that measures the degree of correlation between two sets of data. The NCC value is between [ -1,1], which is-1 if the test data has no correlation with the fingerprint data, whereas it is 1 if the test data is identical. Generally, the larger the NCC value, the greater its correlation. And calculating the NCC of the video frame noise and the equipment fingerprint to be detected in order to attribute the video to be detected. NCC for frame t is defined as follows:
Figure BDA0002420195700000111
wherein, K represents the fingerprint of the device,
Figure BDA0002420195700000112
the PRNU noise, avg (K) and ≧ representing the frame estimate>
Figure BDA0002420195700000113
Are respectively K and->
Figure BDA0002420195700000114
The < a, b > is inner product operation, and | | · | | | represents euclidean norm.
Due to video coding, noise residues extracted from frames can be damaged to different degrees, so that the NCC value of a part of video frames is unreliable, and the value is directly used for tracing detection, so that the tracing result is inaccurate. In consideration of the accuracy of the selected features, the method utilizes adjacent frames for judgment, obtains the NCC value of each frame in a window by adopting a sliding window for the video to form an NCC sequence, and performs subsequent tracing operation by taking an NCC sequence vector as a classification feature. The NCC signature sequence for frame t is defined as:
Figure BDA0002420195700000115
where m represents the length of the sliding window,
Figure BDA0002420195700000116
indicating a rounding down operation.
After the NCC sequences are obtained, feature information is fed into a classifier to learn the matched and unmatched NCC sequences, and a classification model is trained to conduct video source tracing. When the test video is subjected to source tracing detection, the same frame is contained in a plurality of sliding windows, and a plurality of classification voting results are obtained from the frame. The invention takes the average voting result as the final classification result of the frame.
2.2SVM classification model selection, in the previous image traceability research, most scholars use a KNN classification algorithm and an SVM classifier to classify and recognize images. The KNN classification algorithm can directly process multi-classification conditions, but the training process does not exist, the test data and the training data need to be calculated for judgment in each classification of the KNN, the calculation amount is large, and the KNN classification algorithm is not suitable for a real-time video monitoring system. Therefore, the invention selects the SVM classifier to classify the video images.
The SVM classifier is a two-class classifier for classifying data based on statistics and cannot directly handle the multi-classification problem. In a video surveillance system, each video surveillance device represents a category, and video tracing is essentially a multi-classification problem. Therefore, when dealing with the traceability problem of the video surveillance system, it is necessary to construct an appropriate multi-class classifier.
The construction method of the SVM multi-classifier mainly comprises two methods: direct configurations and indirect configurations. The direct construction is realized by modifying the original function, the calculation complexity is high, and the practical application is difficult. The indirect method is largely classified into one-to-many methods (OVRSVMs) and one-to-one methods (OVOSVMs). The one-to-many method is to divide all the categories in turn, and each time, the category is taken as a positive category, and all other categories are classified as negative categories. And if the training sample class is n, constructing n class II classifiers. And classifying the test data rows by using n secondary classifiers, and selecting the class with the most statistical votes as the attribution class. The one-to-one method is to construct an SVM classifier between every two SVM classifiers, and when classifying the test data, the class with the most votes is also selected as the class, and the specific process is shown in FIG. 7. One-to-many methods require that all classifiers be trained from scratch each time a new class is added. And the one-to-one method only needs to retrain the model of the newly added sample without influencing the previously constructed classifier, and has higher relative speed. Therefore, the invention adopts a one-to-one method to construct the multi-classifier, and uses the LibSVM to carry out multi-classification.
The invention uses a LibSVM default kernel function-radial basis kernel function (RBF) as the kernel function of the classification model. The RBF kernel function has universality and is suitable for various types of samples. Compared with a polynomial kernel function, the RBF kernel function has the advantages of few required parameters, low function complexity and convenience in calculation. Two important parameters, namely a self-contained parameter g and a penalty factor c, are arranged in the RBF core, and the selection of the proper parameter is important for the classification model. The invention uses the mode of grid search to select the optimal parameters: all (c, g) values were used for cross-validation, and the pair of (c, g) values with the highest accuracy was used as the optimal parameter.
2.3SVM tracing process
The SVM classifier is divided into a training part and a testing part. The training part takes the NCC characteristics generated previously as training samples, selects the optimal parameters for training, and generates a tracing classification model for subsequent prediction classification. The testing part is used for classifying the test samples in real time by aiming at verifying the prediction accuracy and generalization capability of the classifier. The main steps of classification using LibSVM are as follows:
the method comprises the following steps: and (6) data processing. And carrying out unified format processing on the training data and the test data and importing the training data and the test data.
Step two: and selecting the optimal parameters. And (5) obtaining optimal parameters c and g by cross validation in a grid search mode.
Step three: and training a classification model. And training by using the training data to construct a multi-classification model.
Step four: and (6) classifying. Classifying data to be tested by using constructed classification model
Step five: and (5) tracing the source classification result. And calculating the video source tracing classification accuracy according to the classification result.
The classification algorithm mainly detects partial pseudo codes in the process as follows:
Figure BDA0002420195700000131
the technical effects of the present invention will be described in detail below with reference to the accompanying drawings.
The video data of the invention comes from a public security monitoring system in a certain city, camera data of a plurality of brands are utilized to verify the traceability algorithm provided by the invention, meanwhile, the traditional video traceability algorithm is verified, and algorithm performance comparison is carried out by analyzing results among different algorithms.
1. Experimental data to evaluate the results of the video algorithm, the present invention used a set of 15 video surveillance devices from 5 different manufacturers, with the video surveillance specific information shown in table 1. Video surveillance equipment comes from 5 brands respectively: haikangwei Shi, dahua, hanbang Gaokou, tiandi Weiye and Zhongwei. The video monitoring equipment under each brand comprises a plurality of different models: the number of Haekwove vision is 4, the number of Dahua vision is 3, the number of Hanbang Gaokou vision is 2, the number of Tiandi Wei vision is 2, and the number of Zhongwei vision is 1. All devices employ a CCD sensor as the imaging sensor in order to exploit the PRNU noise.
The present invention uses a piece VLC player to capture the real-time video stream, uses ffmpeg open source tool to extract video clips, video frames and required video attributes (rate, DCT coefficients, frame type, etc.), uses LibSVM tool for classification detection.
TABLE 1 Experimental Equipment information
Figure BDA0002420195700000141
As shown in fig. 8, a part of video samples adopt the h.264 compression coding standard, and a plurality of video segments are tested according to the proposed method to verify fingerprint extraction and video traceability effects.
2. According to the method, video source tracing is carried out on the video fingerprint estimation and forgery detection method based on the available area, the traditional video fingerprint estimation method based on the I frame and the video fingerprint estimation method based on all frames, and comparison is carried out on the two aspects of the video frame number and the accuracy rate respectively.
(1) Video frame number comparison
From the foregoing, it can be seen that the present invention calculates the average of the PRNU noise of a video frame as the device fingerprint K. In order to judge and obtain the number of frames required by accurate video noise, the invention sequentially extracts fingerprints of 5 sections of different videos used by each device, and extracts video frames with different lengths from the same video of each device to construct the device fingerprints. And (3) sequentially adding 10 frames from 0 to extract PRNU fingerprints, performing video tracing by using the constructed fingerprints, calculating the tracing accuracy rate, and finally taking the average value of multiple experimental results as the final frame number result.
As can be seen from fig. 9, when the tracing accuracy reaches 90%, the average required frame number of the video fingerprint estimation method based on all frames is 220 frames, the average required frame number of the video fingerprint estimation method based on I frames is only 60 frames, and the average required frame number of the method based on the available region extracted by the present invention is 58 frames. It can be seen that the difference between the required frame number of the proposed method and the required frame number of the I-frame-based method is not large, but the required frame number is reduced by 73.6% compared with the fingerprint construction method based on all frames.
(2) Comparison of accuracy
The video tracing method and the video tracing system respectively perform video tracing experiments among video monitoring devices of different brands, different models and different devices of the same model, and simultaneously test videos of different code rates of the same device so as to verify the video tracing effect. The specific experimental groups are shown in table 2 below.
Table 2 experimental grouping information
Figure BDA0002420195700000151
Figure BDA0002420195700000161
The test utilizes an SVM classifier to perform 10 tests on each type of test so as to verify the performance of the method. In each operation, the classification accuracy rate only slightly changes, and the method has stability, and finally records the average classification accuracy rate of each type of experiment. The experimental classification results are shown in table 3, and the method comparison line chart is shown in fig. 10 in order to more intuitively show the comparison effect.
TABLE 3 video monitoring device Classification accuracy comparison
Figure BDA0002420195700000162
As shown in fig. 10, all three video tracing algorithms based on the PRNU can accurately identify the brand and model of the camera, and the three algorithms can obtain almost the same result. When video source tracing classification is carried out among devices with the same model, the classification accuracy of the three methods is reduced, and part of error classification is caused by the fact that the same characteristics exist among the devices with the same model. The average classification accuracy rate of the method provided by the invention in the aspect of video equipment source identification reaches 96.85%, and compared with the traditional PRNU source tracing algorithm, the accuracy rate is improved, the method has a more ideal effect especially in videos with low code rate and high video compression degree, the accuracy rate is improved by about 10%, and the video source tracing can be effectively carried out and the deception attack can be detected.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. It will be appreciated by those skilled in the art that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, for example such code provided on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware) or a data carrier such as an optical or electronic signal carrier. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A monitoring video traceability processing method is characterized in that the monitoring video traceability processing method carries out variance calculation on an image to be extracted, denoising is carried out by using a Winenr filter, weights of RGB three-color channels are distributed to obtain final PRNU noise, and the average value of the PRNU noise is calculated to be used as an equipment fingerprint; calculating an NCC related sequence of the video frames and the device fingerprints, and feeding the sequence serving as a video feature into a classifier to train a classification model; constructing an NCC characteristic value of a video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification;
the method comprises the steps that a classification model is utilized to predict and classify videos to be detected, and a LibSVM default kernel function-radial basis kernel function RBF is used as a kernel function of the classification model; selecting the optimal parameters by using a grid search mode: all (c, g) values were used for cross-validation, and the pair of (c, g) values with the highest accuracy was used as the optimal parameter.
2. The surveillance video tracing method of claim 1, wherein the available frames of the surveillance video tracing method are selected by R t (x, y) represents whether the region block represented by (x, y) of the current frame t is available for PRNU extraction, and R is zero if all DCT-AC coefficients are zero in a specific block region t (x, y) =0, discarding the block at the time of video noise extraction; otherwise, set R t (x, y) =1, the PRNU noise for that block is used during video noise extraction, where t denotes video frame t:
Figure FDA0004042248410000011
extracting PRNU, i.e. R, directly from I-frames t (x, y) is always 1, if the frame is B, P, each block area is judged, and undamaged block areas are selected for PRNU extraction.
3. The surveillance video tracing method of claim 1, wherein PRNU extraction of the surveillance video tracing method comprises:
(1) Decomposing the image into color channels (R, G, B), performing a four-level wavelet transform on each color channel using an 8-tap Daubechies QMF to obtain four-level subbands, each level obtaining subbands in horizontal H, vertical V and diagonal D;
(2) In each subband, the local variance of the original noiseless image is estimated for each wavelet coefficient, done by using the maximum a posteriori MAP estimation performed on four sizes of the square W × W domain, W ∈ {3,4,7,9}:
Figure FDA0004042248410000021
where c ∈ { H, V, D }, c (i, j) is the high frequency component, σ 0 Controlling the degree of noise suppression, σ 0 =5;
(3) Four variances in four levels are compared, and the minimum is selected as the best variance estimate:
σ 2 (i,j)=min(σ 3 2 (i,j),σ 5 2 (i,j),σ 7 2 (i,j),σ 9 2 (i,j)),(i,j)∈J;
(4) Obtaining denoised wavelet coefficients using a wiener filter:
Figure FDA0004042248410000022
(5) Repeating the above process for each sub-band and each color channel video frame, obtained using inverse wavelet transformI clean ,I clean And obtaining the noise value of each color channel of the current video frame through subtraction operation as a result after denoising:
I noise =I-I clean
(6) The weights for the three color channels are assigned, and the enhanced PRNU noise is obtained in combination for all channels:
Figure FDA0004042248410000023
4. the surveillance video tracing method according to claim 1, wherein the method extracts an original video sequence of a device in advance, obtains PRNU noise by repeating the extraction process for a series of video frames of the same video, calculates an average value as a device fingerprint K:
Figure FDA0004042248410000024
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004042248410000025
is the noise extracted from the t-th frame, n is the number of frames processed, R t Representing a selected region of the frame.
5. The surveillance video tracing method according to claim 1, wherein the surveillance video tracing method comprises selecting video features, extracting PRNU noise of a current frame, performing correlation calculation with a video device fingerprint, and querying an original device attribute of the video frame; the normalized cross-correlation NCC is used for measuring the correlation degree of two groups of data, and the NCC value [ -1,1], if the test data has no correlation with the fingerprint data, the NCC value is-1, otherwise, if the test data and the fingerprint data are completely the same, the NCC value is 1, the NCC of the noise of the video frame to be detected and the device fingerprint is calculated, and the NCC of the frame t is defined as follows:
Figure FDA0004042248410000031
wherein, K represents the fingerprint of the device,
Figure FDA0004042248410000032
the PRNU noise, avg (K) and ≧ representing the frame estimate>
Figure FDA0004042248410000033
Are respectively K and
Figure FDA0004042248410000034
the < a, b > is inner product operation, | | | · | | | represents euclidean norm;
judging by using adjacent frames, acquiring the NCC value of each frame in a window by adopting a sliding window for a video to form an NCC sequence, and performing subsequent tracing operation by taking an NCC sequence vector as a classification characteristic, wherein the NCC characteristic sequence of a frame t is defined as:
Figure FDA0004042248410000035
where m represents the length of the sliding window,
Figure FDA0004042248410000036
represents a rounding down operation;
when the NCC sequences are obtained, the characteristic information is fed into a classifier to learn the matching and non-matching NCC sequences, and a classification model is trained to carry out video source tracing and serve as a final classification result of the frame through average voting results.
6. The surveillance video traceability processing method of claim 1, wherein the SVM implementation traceability of the surveillance video traceability processing method comprises:
step one, data processing, namely performing unified format processing on training data and test data and importing the training data and the test data;
selecting optimal parameters, and obtaining optimal parameters c and g through cross validation by using a grid search mode;
training a classification model, training by using training data, and constructing a multi-classification model;
classifying, namely classifying the data to be tested by using the constructed classification model;
and fifthly, calculating the video source tracing classification accuracy according to the source tracing classification result.
7. A program storage medium for receiving user input, the stored computer program causing an electronic device to execute the surveillance video traceability processing method of any one of claims 1-6, comprising the steps of: carrying out variance calculation on an image to be extracted, denoising by using a Winener filter, distributing weights of RGB three-color channels to obtain final PRNU noise, and calculating a PRNU noise average value as an equipment fingerprint; calculating NCC related sequences of the video frames and the device fingerprints, and feeding the sequences serving as video features into a classifier to train a classification model; and constructing an NCC characteristic value of the video to be detected by the same method, and performing prediction classification on the video to be detected by using a classification model to obtain a classification result so as to realize video source identification.
8. A surveillance video traceability processing system for implementing the surveillance video traceability processing method of any one of claims 1 to 6, wherein the surveillance video traceability processing system comprises:
a PRNU noise extraction module 1 for extracting PRNU noise from available video frames;
the fingerprint construction module is used for constructing a unique fingerprint of the video monitoring equipment;
and the video tracing detection module is used for classifying by constructing the NCC related sequence.
9. The surveillance video traceability processing system of claim 8, wherein the PRNU noise extraction module comprises:
the video available frame extraction module is used for extracting available video frames;
and the PRNU noise extraction module is used for extracting PRNU noise.
10. A video monitoring terminal, characterized in that, the video monitoring terminal carries the surveillance video traceability processing system of claim 8.
CN202010203610.7A 2020-03-20 2020-03-20 Monitoring video tracing processing method, system, storage medium and video monitoring terminal Active CN111553848B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010203610.7A CN111553848B (en) 2020-03-20 2020-03-20 Monitoring video tracing processing method, system, storage medium and video monitoring terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010203610.7A CN111553848B (en) 2020-03-20 2020-03-20 Monitoring video tracing processing method, system, storage medium and video monitoring terminal

Publications (2)

Publication Number Publication Date
CN111553848A CN111553848A (en) 2020-08-18
CN111553848B true CN111553848B (en) 2023-04-07

Family

ID=72004129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010203610.7A Active CN111553848B (en) 2020-03-20 2020-03-20 Monitoring video tracing processing method, system, storage medium and video monitoring terminal

Country Status (1)

Country Link
CN (1) CN111553848B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767360A (en) * 2021-01-21 2021-05-07 湖南大学 Traceability system based on photosensitive device noise fingerprint
CN112991345B (en) * 2021-05-11 2021-08-10 腾讯科技(深圳)有限公司 Image authenticity detection method and device, computer equipment and storage medium
CN114567798B (en) * 2022-02-28 2023-12-12 南京烽火星空通信发展有限公司 Tracing method for short video variety of Internet
CN116363686B (en) * 2023-06-02 2023-08-11 深圳大学 Online social network video platform source detection method and related equipment thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014163597A2 (en) * 2013-04-05 2014-10-09 Uludağ Üni̇versi̇tesi̇ Tto Anonymization system and method for digital images
CN108154080A (en) * 2017-11-27 2018-06-12 北京交通大学 A kind of method that video equipment is quickly traced to the source
CN110121109A (en) * 2019-03-22 2019-08-13 西安电子科技大学 Towards the real-time source tracing method of monitoring system digital video, city video monitoring system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014163597A2 (en) * 2013-04-05 2014-10-09 Uludağ Üni̇versi̇tesi̇ Tto Anonymization system and method for digital images
CN108154080A (en) * 2017-11-27 2018-06-12 北京交通大学 A kind of method that video equipment is quickly traced to the source
CN110121109A (en) * 2019-03-22 2019-08-13 西安电子科技大学 Towards the real-time source tracing method of monitoring system digital video, city video monitoring system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于相机溯源的潜在不良视频通话预警;马晓晨等;《光学精密工程》(第11期);全文 *

Also Published As

Publication number Publication date
CN111553848A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111553848B (en) Monitoring video tracing processing method, system, storage medium and video monitoring terminal
Lin et al. Digital image source coder forensics via intrinsic fingerprints
Hsu et al. Video forgery detection using correlation of noise residue
Ye et al. Unsupervised feature learning framework for no-reference image quality assessment
Chen et al. Determining image origin and integrity using sensor noise
Ferrara et al. Image forgery localization via fine-grained analysis of CFA artifacts
Chierchia et al. On the influence of denoising in PRNU based forgery detection
Muhammad et al. Blind copy move image forgery detection using dyadic undecimated wavelet transform
Lin et al. A passive-blind forgery detection scheme based on content-adaptive quantization table estimation
CN110121109A (en) Towards the real-time source tracing method of monitoring system digital video, city video monitoring system
Taspinar et al. Camera fingerprint extraction via spatial domain averaged frames
Li et al. Image quality assessment using deep convolutional networks
Kalka et al. A preliminary study on identifying sensors from iris images
Chu et al. Detectability of the order of operations: An information theoretic approach
CN105120294A (en) JPEG format image source identification method
Lorch et al. Reliable camera model identification using sparse gaussian processes
Cozzolino et al. Multiple classifier systems for image forgery detection
CN111709930A (en) Pattern noise based picture provenance and tampering identification method
Pandey et al. A passive forensic method for video: Exposing dynamic object removal and frame duplication in the digital video using sensor noise features
Chu et al. Forensic identification of compressively sensed images
Bammey Jade owl: Jpeg 2000 forensics by wavelet offset consistency analysis
Cozzolino et al. A comparative analysis of forgery detection algorithms
Li et al. Random subspace method for source camera identification
Conotter et al. Joint detection of full-frame linear filtering and JPEG compression in digital images
Nam et al. DHNet: double MPEG-4 compression detection via multiple DCT histograms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant