CN114554227B

CN114554227B - Compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering

Info

Publication number: CN114554227B
Application number: CN202210048894.6A
Authority: CN
Inventors: 田妮莉; 苏开清; 潘晴
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2023-05-23
Anticipated expiration: 2042-01-17
Also published as: CN114554227A

Abstract

The invention discloses a compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering, which comprises the following steps: respectively intervening decoding the reference and the compressed video to be tested, taking out video frames before a loop filtering module, and obtaining macro block quantization parameter QP values of the corresponding frames to form a matrix; performing multi-scale dual-density dual-tree complex wavelet transformation on each frame, adopting local self-adaptive threshold window wiener filtering to process 32 high-frequency subbands of each layer, and obtaining noise residuals before and after noise reduction after inverse transformation; taking the QP matrix as the weight of the noise residual error, and obtaining the final sensor mode noise by adopting a maximum likelihood estimator; and correlating sensor mode noise between the reference and the compressed video to be tested by using the symbol peak correlation energy, and judging according to the threshold value. The method can extract more sufficient sensor mode noise from the compressed video, effectively improve the recognition effect of the compressed video, and has stronger robustness in the recognition of the compressed video source with shorter duration.

Description

Compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering

Technical Field

The invention relates to the technical field of video evidence obtaining, in particular to a compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering.

Background

With the advent of the 5G information age and the tremendous growth of people communicating in social networks, more and more people communicate through the internet, and pictures and videos are widely used as an important information communication carrier by people in the internet. Meanwhile, some videos seriously disturbing illegal crimes such as public security, piracy, malicious tampering and the like of society inevitably appear in a network, and the videos uploaded on the Internet are transmitted for a plurality of times and necessarily undergo compression to different degrees in the transmission process, so that confirming the source of the compressed videos is an important research topic for multimedia information security evidence obtaining in recent years.

When the sources of the images and the videos need to be confirmed, the most direct method is to view the watermark information of the images and the videos, but with the continuous development of the watermarks of the image video editing software, the watermark change becomes very simple, so that the method is very unreliable in source detection of the videos. Researchers have begun to focus on the inherent features in digital image video. Such as: swaminothan a, wu M, liu K J r.nonintrusive component forensics of visual sensors using output images J [ IEEE Transactions on Information Forensics and Security,2007,2 (1): 91-106 (swaminothan a, wu M, liu KJ r. Non-invasive component forensics using visual sensors of output images [ J ]. IEEE information forensics and security journal, 2007,2 (1): 91-106), digital image sources were authenticated using extraction using noise artifacts caused by CFA interpolation algorithms. Another example is: choi K S, lam E Y, wong K Y.Source camera identification using footprints from lens aberration [ C ]// procedures of SPIE.2006,6069:172-179 (Choi KS, lam EY, wong KK Y. Source Camera identification using the footprint left by lens aberrations [ C ]// SPIE theory, 2006, 6069:172-179), the source of the image is determined by utilizing the characteristic that aberration noise generated by lens image distortion affects the statistical properties of the picture. And the following steps: dirik A E, sencrar H T, memon N.digital single lens reflex camera identification from traces of sensor dust [ J ]. IEEE Transactions on Information Forensics and Security,2008,3 (3): 539-552 (Dirik AE, sencrar HT, memon N. Identify digital single lens reflex camera [ J ]. IEEE information evidence and safety journal, 2008,3 (3): 539-552) from sensor dust traces) that sensor dust is generated by digital single lens reflex cameras due to the interchangeable lens deployed, dust particles deposited in front of the imaging sensor form a persistent pattern in all captured images, which is used for extraction to source detect the images.

A good digital image video source detection algorithm must be robust and high detection rate. Although the method performs source detection on the image to a certain extent, the method has the problems of high calculation complexity and poor recognition effect. Such as CFA interpolation recognition, when encountering the same model of camera recognition, the same model of camera will generate the same CFA interpolation operation, and the CFA interpolation recognition will lose effect. As another example, sensor dust recognition, where the new camera accumulates less sensor dust, it is difficult to recognize with the sensor dust. As researchers get more insight into the research, a method for detecting multimedia sources with remarkable effect, namely a digital image video source detection method based on sensor pattern noise, is proposed.

As one of important components of camera imaging, due to the defects of manufacturing materials and production processes, a certain noise artifact with uniqueness is generated during imaging, and even if the noise artifacts generated by cameras of the same model are different, the unique noise artifact is called sensor mode noise by researchers. The sensor pattern noise is mainly composed of light response non-uniformity (PRNU) noise, so PRNU is also known as sensor pattern noise, and can be used as a fingerprint of a camera to perform source detection on unknown images and videos.

With the deep research of researchers at home and abroad on the extraction of sensor pattern noise, the sensor pattern noise is found to be a weak signal relative to the original content of an image, and researchers propose to extract the sensor pattern noise by using a filtering algorithm, such as: lukas J, fridrich J, goljan M.digital camera identification from sensor pattern noise [ J ]. IEEE Transactions on Information Forensics and Security,2006,1 (2): 205-214 (Lukas J, fridrich J, goljan M. Digital camera from sensor pattern noise identification [ J ]. IEEE information evidence and Security journal, 2006,1 (2): 205-214.), a method of extracting noise residuals in an image using wavelet transform in combination with wiener filtering was proposed. Conotter V, boatog. Analysis of sensor fingerprint for source camera identification [ J ]. Electronics letters,2011,47 (25): 1366-1367 (Conotter V, boatog. Sensor fingerprinting for source camera identification [ J ]. Electronic flash, 2011,47 (25): 1366-1367), a complex BM3D (block matching and 3D filtering) filter is introduced to extract noise residuals from the image. BM3D has proven to be effective in extracting noise from images by identifying similar blocks in the images and combining them together. A context adaptive SPN predictor for trusted source camera recognition [ J ]. EURASIP image and video processing journal, 2014 (1): 1-11.) and Zeng H, kang X.Fast source camera identification using content adaptive guided image filter [ J ]. Journal of forensic sciences,2016,61 (2): 520-526. (Zeng H, kang X. Fast source camera recognition based on content adaptive guided image filter [ J ]. French medical journal, 2016,61 (2): 520-526.). Years sequentially propose context-based adaptive interpolation algorithms and adaptive guided image filtering algorithms that can further extract noise residuals. Subsequently in Lawgaly a, khelifi f.sensor pattern noise estimation based on improved locally adaptive DCT filtering and weighted averaging for source camera identification and verification [ J ]. IEEE Transactions on Information Forensics and Security,2016,12 (2): 392-404 (Lawgaly a, khelifi f. Sensor pattern noise estimation based on improved local adaptive DCT filtering and weighted averaging is used for source camera identification and verification [ J ]. IEEE Transactions on Information Forensics and Security,2016,12 (2): 392-404 ]) and Zeng H, wan Y, deng K, et al source camera identification with Dual-Tree complex wavelet transform [ J ]. IEEE Access,2020,8:18874-18883 (zen H, wan Y, deng K, et al source camera identification based on dual-tree complex wavelet transform [ J ]. IEEE Access,2020,8:18874-18883 ]) improved local adaptive discrete cosine transform filtering and dual complex wavelet transform combined local adaptive window wiener filtering methods have been proposed which achieve better filtering results than the former. After obtaining the noise residuals using filtering methods, some researchers found that the noise residuals contained not only sensor pattern noise, but also other non-unique components that could be more and more complex in compressed video, such as CFA interpolation noise, speckle noise, image texture detail content, etc. In Chen M, fridrich J, goljan M, et al determining image origin and integrity using sensor noise [ J ]. IEEE information evidence and Security journal 2008,3 (1): 74-90 (Chen M, fridrich J, goljan M, et al. Determining image origin and integrity with sensor noise [ J ]. IEEE information evidence and Security journal, 2008,3 (1): 74-90.), it is proposed to estimate one sensor pattern noise for a 50-piece noise residual image using a maximum likelihood estimation algorithm and further filter other noise for the estimated sensor pattern noise using a zero-mean combined with frequency domain wiener filtering method. The use of camera reference phase sensor pattern noise to improve source camera identification performance [ J ]. IEEE information forensics and Security journal, 2011,7 (2): 393-402 ] proposes identification using only the noise residual in the Fourier domain of phase information. The refined sensor pattern noise is finally used for matching detection. Many different ways of extracting sensor pattern noise have emerged. Lin X, liC T.pre-processing reference sensor pattern noise via spectrum equalization [ J ]. IEEE Transactions on Information Forensics and Security,2015,11 (1): 126-140 (Lin X, liC T. Pretreatment by spectral equalization reference sensor pattern noise [ J ]. IEEE information evidence and Security journal, 2015,11 (1): 126-140) proposes a spectral equalization method to suppress local peaks to achieve smoother results. Most of the above algorithms can improve the quality of sensor pattern noise of an image, but because the video is subjected to compression for many times in internet transmission, the noise is more complex, and the sensor pattern noise is also suppressed so that the sensor pattern noise is difficult to extract from the compressed video by the conventional image extraction algorithm, and therefore, the recognition effect of the algorithms applied to the compressed video is extremely limited.

For the above reasons, in order to extract more sensor pattern noise from the compressed video and improve the recognition effect of the compressed video, it is necessary to study a compressed video source detection algorithm for the compressed video.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a compressed video source detection method based on multi-scale transform domain adaptive wiener filtering, which can extract more sufficient sensor pattern noise from a compressed video, effectively improve the recognition effect of the compressed video and has stronger robustness in the recognition of the compressed video with shorter duration.

In order to achieve the above purpose, the technical scheme provided by the invention is as follows:

a compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering comprises the following steps:

s1, obtaining sensor mode noises of a plurality of reference compressed videos, and storing the sensor mode noises into a sensor mode noise database;

s2, solving sensor mode noise of the test compressed video;

s3, the correlation between the sensor pattern noise of the test compressed video and the sensor pattern noise of the reference compressed video in the sensor pattern noise database is calculated by using the symbol peak correlation energy, if the symbol peak correlation energy is greater than or equal to a set threshold value, the test compressed video is judged to come from a camera recording the reference video, otherwise, the test compressed video is not come from the camera recording the reference compressed video.

Further, in the steps S1 and S2, the specific steps for obtaining the sensor pattern noise are as follows:

a1, changing the video into bit stream data again, intervening in the decoding process, outputting all video frames before a loop filter of a coder-decoder, and simultaneously obtaining macro block quantization parameter QP values of corresponding frames to form a QP matrix;

a2, obtaining G Zhang Shipin frames, and enabling the video frames to be numbered as I ₁ ，I ₂ ，...，I _G The method comprises the steps of carrying out a first treatment on the surface of the Respectively carrying out multi-layer double-density double-tree complex wavelet decomposition transformation on each video frame;

a3, performing filtering estimation on all high-frequency sub-bands by using a local self-adaptive threshold window wiener filtering method to obtain filtered wavelet sub-bands; performing double-density double-tree complex wavelet decomposition inverse transformation to obtain a denoised video frame; subtracting the input video frame from the denoised video frame to obtain a noise residual error of each video frame;

a4, weighting I by using QP matrix maximum likelihood estimation algorithm ₁ ,I ₂ ,...,I _G Estimating the set of noise residuals to obtain a multiplication factor K of the preliminary sensor mode noise of the video;

and A5, removing CFA interpolation artifacts of the multiplication factors K by using zero-averaging operation to obtain multiplication factors K without the CFA interpolation artifacts, filtering the multiplication factors K without the CFA interpolation artifacts by using a frequency domain wiener filtering algorithm to further remove other non-unique noise components, multiplying the multiplication factors with each input video frame, and then averaging to obtain sensor mode noise.

Further, in the step A2, the specific process of the dual-density dual-tree complex wavelet decomposition transformation is as follows:

performing dual tree decomposition on each video frame, dividing the video frame into two dual trees, and performing two-dimensional wavelet decomposition on one-dimensional data formed by each row of each tree to obtain high-frequency, sub-high-frequency and low-frequency parts;

then carrying out two-time one-dimensional wavelet decomposition on the one-dimensional data formed by each row of the high-frequency information, the sub-high-frequency information and the low-frequency information formed by decomposition to finally obtain nine sub-band images, namely eight high-frequency sub-bands L ₀ H ₁ 、L ₀ H ₂ 、H ₁ L ₀ 、H ₁ H ₁ 、H ₁ H ₂ 、H ₂ L ₀ 、H ₂ H ₁ 、H ₂ H ₂ And a low frequency subband L ₀ L ₀ The method comprises the steps of carrying out a first treatment on the surface of the Each high frequency subband is in turn divided into two parts, real and imaginary, each layer yielding 32 high frequency components.

Further, in the step A3, the local adaptive threshold window wiener filtering method is as shown in formula (1):

in the formula (1), W _in Expressed as wavelet coefficients before filtering, W _out Represented as filtered wavelet coefficients, noise variance estimate

And subband variance->

Obtained from the formulas (2) and (3):

in formula (2), mean () represents the median estimator, W _temp A first high frequency subband representing a first layer decomposition;

in the formula (3), N _h A local window of size hxh with (u, v) as the center point; max () represents the maximum value of 0 and variance estimation, and min () represents the minimum value of all window estimation results.

Further, in the step A4, the maximum likelihood estimation algorithm weighted by the quantization parameter value is expressed as equation (4):

in the formula (4), G is the number of video frames in a single video for estimating the sensor mode noise factor K, N _z Representing the noise residual of the z-th video frame, I _z Z-th video frame representing video, W _QP Calculating correlation according to different quantization parameters QP, drawing to obtain QP-SPCE curve, and weighting by using the curve relation to formulate the weight matrix; delta is a set value for preventing denominator from being 0.

Further, in the step A5,

the zero-averaging process is to subtract a column average value from each pixel in a column and then a row average value from each pixel in a row;

the frequency domain wiener filtering process is to transform the sensor mode noise without CFA interpolation artifact into the frequency domain, and estimate the frequency domain by using wiener filtering operation.

Further, in the step S3, the symbol peak correlation energy is expressed as equation (5):

wherein sign () is a sign function, C _RQ (a, b) is a two-dimensional cyclic cross-correlation between sensor pattern noise R of the reference compressed video and sensor pattern noise Q of the test compressed video, β is a small area around (0, 0), β is the dimensional product of that area, and MN is the dimensional product of the matching sensor pattern noise.

Compared with the prior art, the scheme has the following principle and advantages:

1) The loop filter module of the coder-decoder can filter partial sensor mode noise in the video frame, the scheme changes the coding and decoding process, and more sensor mode noise in the video frame can be saved by extracting the video frame before reaching the loop filter module.

2) The sensor mode noise belongs to medium-high frequency noise, and the wavelet decomposition method can better separate high-low frequency information of the image signal. The dual-density dual-tree complex wavelet transform has the advantages of dual-density wavelet transform and dual-tree complex wavelet transform, can provide signals in 16 main directions, and has two wavelets (real wavelets and imaginary wavelets) for representing each main direction. Compared with wavelet transformation, the dual-tree complex wavelet transformation and the dual-density wavelet transformation can further improve the resolution and reconstruction accuracy of video frames.

3) The noise variance of wiener filtering is a fixed value, and accurate and effective estimation is difficult to perform on video frames with different compression degrees. Noise variance of wavelet sub-bands is estimated using a median estimator, and wavelet sub-bands of highest frequencies are generally selected for noise variance estimation in the image denoising field, since the scheme is to extract sensor pattern noise of medium and high frequencies, medium and high frequency sub-bands (LH ₁ ) Better extraction results are obtained by performing the noise variance estimation.

4) Compared with the original maximum likelihood estimation, the maximum likelihood estimation based on QP value weighting can give different weights to each pixel value according to different compression degrees, can better inhibit artifacts generated by complex compression degrees, and effectively improves the quality of sensor mode noise.

5) Compared with the existing sensor noise extraction algorithm, the scheme can keep more sensor mode noise for the input video frame, particularly, can extract more sensor mode noise in the filtering process, and can effectively inhibit other noise components contained in the sensor mode noise in the subsequent process, so that the scheme has higher recognition effect and robustness.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the services required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the figures in the following description are only some embodiments of the present invention, and that other figures can be obtained according to these figures without inventive effort to a person skilled in the art.

FIG. 1 is a schematic flow chart of a compressed video source detection method based on multi-scale transform domain adaptive wiener filtering (step S1 is omitted);

fig. 2 is a schematic flow chart of solving sensor mode noise in the compressed video source detection method based on multi-scale transform domain adaptive wiener filtering.

Detailed Description

The invention is further illustrated by the following examples:

as shown in fig. 1, the compressed video source detection method based on multi-scale transform domain adaptive wiener filtering according to the present embodiment includes the following steps:

s2, solving sensor mode noise of the test compressed video;

as shown in fig. 2, in steps S1 and S2, the specific steps for obtaining the sensor pattern noise are as follows:

a1, changing the video into bit stream data again, intervening in the decoding process, and outputting all video frames before a loop filter of a coder-decoder;

a2, obtaining G Zhang Shipin frames, and enabling the video frames to be numbered as I ₁ ，I ₂ ，...，I _G The method comprises the steps of carrying out a first treatment on the surface of the Respectively carrying out 4-layer dual-density dual-tree complex wavelet decomposition transformation on each video frame;

the specific process of the double-density double-tree complex wavelet decomposition transformation is as follows:

the local self-adaptive threshold window wiener filtering method is shown in a formula (1):

in the formula (1), W _in Expressed as wavelet coefficients before filtering, W _out Represented as filtered waveletsCoefficient, noise variance estimation

And subband variance->

Obtained from the formulas (2) and (3):

the maximum likelihood estimation algorithm for quantization parameter value weighting is expressed as equation (4):

A5, removing CFA interpolation artifacts of the multiplication factors K by using zero-averaging operation to obtain multiplication factors K without the CFA interpolation artifacts, filtering the multiplication factors K without the CFA interpolation artifacts by using a frequency domain wiener filtering algorithm to further remove other non-unique noise components, multiplying the multiplication factors with each input video frame, and then averaging to obtain sensor mode noise;

wherein the zero-averaging process is to subtract a column average value from each pixel in a column and then subtract a row average value from each pixel in a row;

In this step, the symbol peak correlation energy is expressed as equation (5):

in the formula (5), sign () is a sign function, C _RQ (a, b) is a two-dimensional cyclic cross-correlation between sensor pattern noise R of the reference compressed video and sensor pattern noise Q of the test compressed video, β is a small area around (0, 0), β is the dimensional product of that area, and MN is the dimensional product of the matching sensor pattern noise.

The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, so variations in shape and principles of the present invention should be covered.

Claims

1. The compressed video source detection method based on the multi-scale transform domain self-adaptive wiener filtering is characterized by comprising the following steps:

s2, solving sensor mode noise of the test compressed video;

s3, using the symbol peak correlation energy to calculate the correlation between the sensor pattern noise of the test compressed video and the sensor pattern noise of the reference compressed video in the sensor pattern noise database, if the symbol peak correlation energy is greater than or equal to a set threshold, judging that the test compressed video is from a camera recording the reference compressed video, otherwise, the test compressed video is not from the camera recording the reference compressed video;

in the steps S1 and S2, the specific steps for obtaining the sensor pattern noise are as follows:

and A5, removing the CFA interpolation artifact of the multiplication factor K by using zero-averaging operation to obtain the multiplication factor K without the CFA interpolation artifact, filtering the multiplication factor K without the CFA interpolation artifact by using a frequency domain wiener filtering algorithm to further remove other non-unique noise components, multiplying the filtered multiplication factor K without the CFA interpolation artifact with each input video frame, and then averaging to obtain the sensor mode noise.

2. The compressed video source detection method based on multi-scale transform domain adaptive wiener filtering according to claim 1, wherein in the step A2, the specific process of the dual-density dual-tree complex wavelet decomposition transform is as follows:

performing dual tree decomposition on each video frame to obtain two dual trees, and performing two-dimensional wavelet decomposition on one-dimensional data formed by each row of each tree to obtain high-frequency, sub-high-frequency and low-frequency parts;

3. The compressed video source detection method based on multi-scale transform domain adaptive wiener filtering according to claim 2, wherein in the step A3, the local adaptive threshold window wiener filtering method is as shown in formula (1):

in the formula (1), W _in Expressed as wavelet coefficients before filtering, W _out Expressed as a filtered wavelet coefficient, noise squareDifference estimation

And subband variance->

Obtained from the formulas (2) and (3): />

4. The method for detecting a compressed video source based on multi-scale transform domain adaptive wiener filtering according to claim 1, wherein in the step A4, the maximum likelihood estimation algorithm weighted by quantization parameter values is expressed as equation (4):

in the formula (4), G is the number of video frames in a single video for estimating the sensor mode noise factor K, nz represents the noise residual of the z-th video frame, I _z Representing a z-th video frame of the video, wherein WQP represents calculating correlation according to different quantization parameters QP, drawing to obtain QP-SPCE curves, and weighting by using a weighting matrix formulated by the curve relation; delta is a set value for preventing denominator from being 0.

5. The method for compressed video source detection based on multi-scale transform domain adaptive wiener filtering of claim 1, wherein in step A5,

the frequency domain wiener filtering process is to transform the sensor mode noise without CFA interpolation artifact into the frequency domain and then estimate the sensor mode noise by using wiener filtering operation.

6. The method for detecting a compressed video source based on multi-scale transform domain adaptive wiener filtering according to claim 1, wherein in the step S3, the symbol peak correlation energy is expressed as equation (5):

wherein sign () is a sign function, C _RQ (a, b) is a two-dimensional cyclic cross-correlation between sensor pattern noise R of the reference compressed video and sensor pattern noise Q of the test compressed video, β is a small region around (0, 0), β is the dimensional product of the region, and MN is the dimensional product of the matching sensor pattern noise.