CN114554227A

CN114554227A - Compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering

Info

Publication number: CN114554227A
Application number: CN202210048894.6A
Authority: CN
Inventors: 田妮莉; 苏开清; 潘晴
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-05-27
Anticipated expiration: 2042-01-17
Also published as: CN114554227B

Abstract

The invention discloses a compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering, which comprises the following steps: respectively intervening and decoding a reference video and a compressed video to be tested, taking out a video frame in front of a loop filtering module, and acquiring a macroblock Quantization Parameter (QP) value of a corresponding frame to form a matrix; carrying out multi-scale dual-density dual-tree complex wavelet transform on each frame, processing 32 high-frequency sub-bands of each layer by adopting local self-adaptive threshold window wiener filtering, and calculating noise residual errors before and after noise reduction after inverse transformation; using the QP matrix as the weight of the noise residual error, and obtaining the final sensor mode noise by adopting a maximum likelihood estimator; and (4) solving the correlation of the sensor mode noise between the reference and the compressed video to be tested by using the symbol peak correlation energy, and making a decision according to a threshold value. The invention can extract more sufficient sensor mode noise from the compressed video, effectively improve the identification effect of the compressed video and has stronger robustness in the identification of the compressed video source with shorter time.

Description

Compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering

Technical Field

The invention relates to the technical field of video forensics, in particular to a compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering.

Background

With the arrival of the 5G information age and the large increase of the number of people who communicate in the social network, more and more people communicate through the Internet, and pictures and videos are widely used by people in the Internet as an important information communication carrier. Meanwhile, videos which seriously disturb illegal crimes such as public security, piracy, malicious tampering and the like of society inevitably appear in the network, and videos uploaded on the internet are transmitted for many times and are inevitably compressed to different degrees in the transmission process, so that the confirmation of the source of the compressed videos is a very important research subject for multimedia information security evidence obtaining in recent years.

When the source of the image and the video needs to be confirmed, the most direct method is to check the self-contained watermark information of the image and the video, but with the continuous development of the watermark of the image and video editing software, the watermark is changed very simply, so that the method is unreliable for detecting the source of the video. Researchers have begun to focus on intrinsic features in digital image video. Such as: swaminathan A, Wu M, Liu K J R. Noninistrative component for senses of visual sensors using output images [ J ]. IEEE Transactions on Information forms and Security,2007,2(1):91-106.(Swaminathan A, Wu M, Liu KJ R. non-invasive component of visual sensor using output image evidence [ J ]. IEEE Information evidence and Security Association, 2007,2(1): 91-106), using extraction with noise artifact caused by CFA interpolation algorithm in sequence to authenticate the source of the digital image. For another example: choi K S, Lam E Y, Wong K Y. Source camera identification using from lens interference [ C ]// Proceedings of SPIE.2006,6069: 172-) (Choi KS, Lam EY, Wong KK Y. footprint left by lens aberration is used for source-machine identification [ C ]// SPIE corpus 2006,6069: 172-) (Choi KS, Lam EY, Wong KK Y.) and the source of the image is judged by using the characteristic that aberration noise generated by lens image distortion influences the statistical characteristics of the image. The following steps are repeated: digital single lens reflex camera [ J ] IEEE Information evidence and Security was identified from sensor dust traces in 2008,3(3):539-552 (digital AE, Sencar HT, Menu N. digital single lens reflex camera [ J ] IEEE Information evidence and Security was identified from sensor dust traces, 2008,3(3): 539-552), digital single lens reflex camera produced sensor dust due to the deployed interchangeable lens, dust particles deposited in front of the imaging sensor formed persistent patterns in all captured images, and this sensor dust pattern was used to extract and detect the source of the image.

A good digital image video source detection algorithm must have strong robustness and high detection rate. Although the method detects the source of the image to a certain extent, the method has the problems of high calculation complexity and poor recognition effect. For example, CFA interpolation recognition, when the same type of camera recognition is encountered, the same CFA interpolation operation is performed by the same type of camera, and the CFA interpolation recognition loses effect. Also, as with sensor dust identification, the new camera has less sensor dust accumulated, which makes it difficult to identify the sensor dust. With the intensive understanding and research of researchers, a method for detecting multimedia sources with remarkable effect, namely a digital image video source detection method based on sensor mode noise, is provided.

The sensor is one of the important parts of camera imaging, and due to the defect problems of the manufacturing material and the production process, some unique noise artifacts are generated during imaging, even if the noise artifacts generated by the same type of camera are different, the unique noise artifacts are called sensor pattern noise by researchers. Since the main component of the sensor pattern noise is composed of photo response non-uniformity (PRNU) noise, PRNU is also considered as sensor pattern noise, and can be used as a fingerprint of a camera to perform source detection on unknown images and videos.

With the deep exploration of the researchers at home and abroad on the extraction of the sensor mode noise, the researchers propose to extract the sensor mode noise by using a filtering algorithm, such as: lukas J, Fridrich J, Golgian M.digital camera identification from sensor pattern [ J ]. IEEE Transactions on Information Forensics and Security,2006,1(2): 205-. Conotter V, Boato G. analysis of sensor finger print for source camera identification [ J ]. Electron letters,2011,47(25):1366 + 1367.(Conotter V, Boato G. sensor finger print for source camera identification [ J ]. Electron Anhui, 2011,47(25):1366 + 1367.), a complex BM3D (block matching and 3D filtering) filter was introduced to extract the noise residual from the image. BM3D has proven to be effective in extracting noise from images by identifying similar blocks in the images and combining them together. A context-adaptive SPN predictor for source camera identification [ J ]. EURASIP Journal Image and video Processing,2014 (1) 1-11 (Kang X, Chen J, Lin K, etc.. A context-adaptive SPN predictor for source camera identification [ J ]. EURASIP Image and video Processing Journal, 2014(1) 1-11) and Zeng H, Kang X.Fast source analysis using content adaptive filter [ J ]. Journal of Image for sensitive cameras, 2016,61(2): 526, J.526, 2016, 520, 2016, 3, 526, 2016, 520, 2016, 3, 2016, F, these algorithms can further extract the noise residual. Subsequently, in Lawgal A, Khalifi F. sensor pattern estimation based on improved local adaptive DCT filtering and weighted averaging, 2016,12(2):392 formation 404 (Lawgal A, Khalifi F. based on improved local adaptive DCT filtering and weighted averaging, sensor pattern noise estimation for source machine identification and verification [ J. IEEE Transactions on Information dynamics and Security,2016,12(2): 188404 ] and Zeng H, Wan Y, Deng K, source machine identification, source machine acquisition, 12(2): 392:188404 ] and wavelet H, Wan Y, Deng K, source machine acquisition-wavelet transformation, J. IEEE Transactions on Information dynamics and Security,2016,12(2): 18883. J. based on improved local adaptive DCT filtering and weighted averaging, wavelet transform J. Q. D. Q. D. Q. D. Q. D. And the method of improving local adaptive discrete cosine transform filtering and even number complex wavelet transform combined with local adaptive window wiener filtering is provided, and the methods obtain better filtering effect than the former method. After obtaining the noise residual by using the filtering method, some researchers find that the noise residual not only contains sensor pattern noise, but also contains other non-unique components for filtering, such as CFA interpolation noise, speckle noise, image texture detail content, and the like, and the non-unique components are more and more complex in the compressed video. In Chen M, Fridrich J, Golgin M, et al, determining image origin and integrating using sensor noise [ J ]. IEEE information forensics and safety journal, 2008,3(1):74-90.(Chen M, Fridrich J, Golgin M, etc.. determining image origin and integrity [ J ]. IEEE information forensics and safety journal, 2008,3(1): 74-90.), it is proposed to estimate a sensor mode noise for 50 noise residual images using a maximum likelihood estimation algorithm and further filter out other noises from the estimated sensor mode noise using a zero mean combined with frequency domain wiener filtering. Kang X, Li Y, Qu Z, et al, engineering source camera identity performance with a camera reference phase sensor pattern noise [ J ]. IEEE Transactions on Information principles and Security,2011,7(2): 393X, Li Y, Qu Z, etc.. Using camera reference phase sensor pattern noise to improve source machine identification performance [ J ]. IEEE Information evidence and Security,2011,7(2):393 402.) proposed phase Information identification in the Fourier domain using only noise residuals. This refined sensor pattern noise is used for match recognition at the end. Many different ways of extracting sensor pattern noise have emerged. Lin X, Li C T.preprocessing reference sensor pattern estimation [ J ]. IEEE Transactions on Information means and Security 2015,11(1): 126-. Most of the above algorithms can improve the quality of sensor pattern noise of images, but since the video is compressed many times during internet transmission, the noise is more complex, and the sensor pattern noise is suppressed so much that the traditional image extraction algorithms are difficult to extract the sensor pattern noise from the compressed video, and therefore, the recognition effect of the algorithms applied to the compressed video is extremely limited.

For the above reasons, in order to extract more sensor pattern noise from the compressed video and improve the recognition effect of the compressed video, it is necessary to research a compressed video source detection algorithm for the compressed video.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a compressed video source detection method based on multi-scale transform domain self-adaptive wiener filtering, which can extract more sufficient sensor mode noise from a compressed video, effectively improve the identification effect of the compressed video and has stronger robustness in identification of the compressed video with shorter time.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

the compressed video source detection method based on the multi-scale transform domain self-adaptive wiener filtering comprises the following steps:

s1, obtaining sensor pattern noises of a plurality of reference compressed videos, and storing the sensor pattern noises into a sensor pattern noise database;

s2, solving sensor mode noise of the test compressed video;

and S3, obtaining the correlation between the sensor pattern noise of the test compressed video and the sensor pattern noise of the reference compressed video in the sensor pattern noise database by using the symbol peak correlation energy, and if the symbol peak correlation energy is greater than or equal to a set threshold, judging that the test compressed video comes from the camera recording the reference video, otherwise, judging that the test compressed video does not come from the camera recording the reference compressed video.

Further, in the steps S1 and S2, the specific steps of determining the sensor pattern noise are as follows:

a1, changing the video into bit stream data again, intervening in the decoding process, outputting all video frames before the loop filter of the coder, and simultaneously obtaining the macroblock quantization parameter QP value of the corresponding frame to form a QP matrix;

a2, obtaining G video frames, and numbering the video frames as I₁，I₂，...，I_G(ii) a Performing multi-layer dual-density dual-tree complex wavelet decomposition transformation on each video frame;

a3, applying a local adaptive threshold window wiener filtering method to carry out filtering estimation on all high-frequency sub-bands to obtain filtered wavelet sub-bands; performing double-density dual-tree complex wavelet decomposition inverse transformation to obtain a denoised video frame; subtracting the denoised video frame from the input video frame to obtain the noise residual of each video frame;

a4 maximum likelihood estimation algorithm pair I weighted by QP matrix₁,I₂,...,I_GEstimating the set of noise residuals to obtain a multiplication factor K of the preliminary sensor mode noise of the video;

a5, removing CFA interpolation artifacts of the multiplication factors K by using zero averaging operation to obtain the multiplication factors K without the CFA interpolation artifacts, then filtering the multiplication factors K without the CFA interpolation artifacts by using a frequency domain wiener filtering algorithm to further remove other non-unique noise components, multiplying the multiplication factors with each input video frame, and then averaging to obtain sensor mode noise.

Further, in step a2, the specific process of the dual-density dual-tree complex wavelet decomposition transform includes:

performing dual tree decomposition on each video frame, dividing the video frame into dual two trees, and performing one-dimensional wavelet decomposition twice on one-dimensional data formed by each line of each tree to obtain high-frequency parts, secondary high-frequency parts and low-frequency parts;

performing one-dimensional wavelet decomposition twice on one-dimensional data formed by each column of high-frequency, sub-high-frequency and low-frequency information formed by decomposition to finally obtain nine subband images which are eight high-frequency subbands L₀H₁、L₀H₂、H₁L₀、H₁H₁、H₁H₂、H₂L₀、H₂H₁、H₂H₂And a low frequency subband L₀L₀(ii) a Each high frequency subband is divided into two parts, a real part and an imaginary part, and each layer obtains 32 high frequency components.

Further, in step a3, the local adaptive threshold window wiener filtering method is as shown in equation (1):

in the formula (1), W_inExpressed as wavelet coefficients before filtering, W_outExpressed as filtered wavelet coefficients, noise variance estimates

Sum sub-band variance

Obtained from the formulae (2) and (3):

in the formula (2), mean () represents a median estimator, W_tempA first high frequency subband representing a first layer decomposition;

in the formula (3), N_hA local window with (u, v) as the center point and hxh as the size; max () represents taking the maximum of 0 and variance estimates, and the min () function represents taking the minimum of all window estimates.

Further, in step a4, the maximum likelihood estimation algorithm for weighting the quantization parameter values is expressed by equation (4):

in equation (4), G is the number of video frames in a single video used to estimate the sensor pattern noise factor K, N_zRepresenting the noise residual of the z-th video frame, I_zZ-th video frame, W, representing a video_QPCalculating correlation according to different quantization parameters QP, drawing to obtain a QP-SPCE curve, and establishing the weight matrix for weighting by using the curve relation; δ is a set value for preventing the denominator from being 0.

Further, in the step A5,

the zero-averaging process is to subtract the column average from each pixel in the column and then the row average from each pixel in the row;

the frequency domain wiener filtering process is to transform the frequency domain of the sensor mode noise without the CFA interpolation artifact, and then estimate the frequency domain by using the wiener filtering operation.

Further, in step S3, the symbol peak correlation energy is expressed by equation (5):

where sign () is a sign function, C_RQ(a, b) is a two-dimensional circular cross-correlation between the sensor pattern noise R of the reference compressed video and the sensor pattern noise Q of the test compressed video, β is a small area around (0, 0), | β | is the product of the dimensions of this area, and MN is the product of the dimensions of the matching sensor pattern noise.

Compared with the prior art, the principle and the advantages of the scheme are as follows:

1) the loop filtering module of the coder-decoder can filter partial sensor mode noise in the video frame, the scheme changes the coding and decoding process, and more sensor mode noise in the video frame can be stored by taking the video frame before the loop filtering module is reached.

2) The sensor mode noise belongs to medium-high frequency noise, and the wavelet decomposition method can better separate high-low frequency information of image signals. The dual-density dual-tree complex wavelet transform has the advantages of dual-density wavelet transform and dual-tree complex wavelet transform, can provide signals in 16 main directions, and each main direction is represented by two wavelets (real wavelets and imaginary wavelets). Compared with wavelet transformation, dual-tree complex wavelet transformation and dual-density wavelet transformation, the dual-density dual-tree complex wavelet transformation can further improve the decomposition and reconstruction precision of video frames.

3) The noise variance of wiener filtering is a fixed value, and accurate and effective estimation is difficult to be carried out on video frames with different compression degrees. The noise variance of wavelet sub-band is estimated by using median estimator, and the wavelet sub-band with highest frequency is generally selected for noise variance estimation in the image denoising field, and the middle-high frequency sub-band (LH) is selected because the scheme is to extract the sensor mode noise of middle-high frequency₁) Better extraction results are obtained by performing noise variance estimation.

4) Compared with the original maximum likelihood estimation, the maximum likelihood estimation based on the QP value weighting can give different weights to each pixel value according to different compression degrees, can better inhibit artifacts generated by more complex compression degrees, and effectively improve the quality of sensor mode noise.

5) Compared with the existing sensor noise extraction algorithm, the scheme can reserve more sensor mode noise for the input video frame, particularly can extract more sensor mode noise in the filtering process, and can effectively inhibit other noise components contained in the sensor mode noise in the subsequent processing, so that the scheme has higher identification effect and robustness.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a compressed video source detection method based on multi-scale transform domain adaptive wiener filtering according to the present invention (step S1 is omitted);

FIG. 2 is a schematic flow chart of the method for detecting sensor pattern noise in a compressed video source based on multi-scale transform domain adaptive wiener filtering according to the present invention.

Detailed Description

The invention will be further illustrated with reference to specific examples:

as shown in fig. 1, the method for detecting a source of a compressed video based on multi-scale transform domain adaptive wiener filtering according to this embodiment includes the following steps:

s2, solving sensor mode noise of the test compressed video;

as shown in fig. 2, the specific steps of determining the sensor pattern noise in steps S1 and S2 are as follows:

a1, changing the video into bit stream data again, intervening in the decoding process, and outputting all video frames before the loop filter of the codec;

a2, obtaining G video frames, and numbering the video frames as I₁，I₂，...，I_G(ii) a Respectively carrying out double-density dual-tree complex wavelet decomposition transformation of 4 layers on each video frame;

the specific process of the dual-density dual-tree complex wavelet decomposition transformation comprises the following steps:

performing dual tree decomposition on each video frame to obtain dual two trees, and performing one-dimensional wavelet decomposition twice on one-dimensional data formed by each line of each tree to obtain high-frequency parts, secondary high-frequency parts and low-frequency parts;

the local adaptive threshold window wiener filtering method is shown as the formula (1):

Sum sub-band variance

Obtained from the formulae (2) and (3):

in the formula (3), N_hA local window with (u, v) as the center point and hxh as the size; max () represents taking 0 and variance estimateThe maximum value in the table, the min () function represents taking the minimum value of all window estimates.

the quantization parameter value weighted maximum likelihood estimation algorithm is expressed as equation (4):

in equation (4), G is the number of video frames in a single video used to estimate the sensor pattern noise factor K, and N_zRepresenting the noise residual of the z-th video frame, I_zZ-th video frame, W, representing a video_QPCalculating correlation according to different quantization parameters QP, drawing to obtain a QP-SPCE curve, and establishing the weight matrix for weighting by using the curve relation; δ is a set value for preventing the denominator from being 0.

A5, removing CFA interpolation artifacts of the multiplication factors K by using zero averaging operation to obtain the multiplication factors K without the CFA interpolation artifacts, then filtering the multiplication factors K without the CFA interpolation artifacts by using a frequency domain wiener filtering algorithm to further remove other non-unique noise components, multiplying the multiplication factors by each input video frame and then averaging to obtain sensor mode noise;

wherein the zero-averaging process is subtracting the column average from each pixel in the column and then subtracting the row average from each pixel in the row;

In this step, the symbol peak correlation energy is expressed by the following equation (5):

in the formula (5), sign () is a sign function, C_RQ(a, b) is a two-dimensional circular cross-correlation between the sensor pattern noise R of the reference compressed video and the sensor pattern noise Q of the test compressed video, β is a small area around (0, 0), | β | is the product of the dimensions of the area, and MN is the product of the dimensions of the matching sensor pattern noise.

The above-mentioned embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereby, and all changes made in the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. The compressed video source detection method based on the multi-scale transform domain self-adaptive wiener filtering is characterized by comprising the following steps of:

s2, solving sensor mode noise of the test compressed video;

2. The method for detecting the source of compressed video based on multi-scale transform domain adaptive wiener filtering as claimed in claim 1, wherein the steps S1 and S2 are as follows:

a3, applying a local adaptive threshold window wiener filtering method to carry out filtering estimation on all high-frequency sub-bands to obtain filtered wavelet sub-bands; carrying out double-density dual-tree complex wavelet decomposition inverse transformation to obtain a denoised video frame; subtracting the denoised video frame from the input video frame to obtain the noise residual of each video frame;

3. The method for detecting the source of a compressed video based on multi-scale transform domain adaptive wiener filtering as claimed in claim 2, wherein in the step a2, the specific process of the dual-density dual-tree complex wavelet decomposition transform is as follows:

height formed by decompositionOne-dimensional data formed by each column of frequency, sub-high frequency and low frequency information is subjected to one-dimensional wavelet decomposition twice to finally obtain nine subband images which are respectively eight high-frequency subbands L₀H₁、L₀H₂、H₁L₀、H₁H₁、H₁H₂、H₂L₀、H₂H₁、H₂H₂And a low frequency subband L₀L₀(ii) a Each high frequency subband is divided into two parts, a real part and an imaginary part, and each layer obtains 32 high frequency components.

4. The method for detecting the source of a compressed video based on multi-scale transform domain adaptive wiener filtering as claimed in claim 2, wherein in the step a3, the locally adaptive threshold window wiener filtering method is represented by equation (1):

Sum subband variance

Obtained from the formulae (2) and (3):

5. The method for detecting the source of a compressed video based on multi-scale transform domain adaptive wiener filtering according to claim 2, wherein in the step a4, the maximum likelihood estimation algorithm for weighting the quantization parameter values is expressed by equation (4):

6. The method for detecting the source of compressed video based on adaptive wiener filtering in multi-scale transform domain according to claim 2, wherein in step A5,

7. The method for detecting the source of compressed video based on multi-scale transform domain adaptive wiener filtering as claimed in claim 1, wherein in said step S3, the symbol peak correlation energy is expressed by equation (5):

where sign () is a sign function, C_RQ(a, b) is a two-dimensional circular cross-correlation between the sensor pattern noise R of the reference compressed video and the sensor pattern noise Q of the test compressed video, β is a small area around (0, 0), | β | is the product of the dimensions of the area, and MN is the product of the dimensions of the matching sensor pattern noise.