CN112233035A

CN112233035A - Image PRNU noise purification method and system based on sample mismatch training

Info

Publication number: CN112233035A
Application number: CN202011132031.4A
Authority: CN
Inventors: 田华伟; 郝昕泽; 肖延辉; 唐云祁
Original assignee: PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Current assignee: PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-01-15

Abstract

The invention discloses a method and a system for purifying image PRNU noise based on sample mismatch training, wherein the method comprises the following steps: s100, acquiring a plurality of noise images of the same digital image sensor to form a training set of a depth stack self-encoder; s200, training to obtain a depth stacking self-encoder model based on a training set of the depth stacking self-encoder, a training technology of sample mismatching and an artificial neural network; s300, acquiring a noise image of a digital image sensor as a PRNU noise image to be purified, and inputting the PRNU noise image to be purified into a depth stack self-encoder model for purification to obtain purified high-quality PRNU noise. The method further purifies the original image imaging sensor noise to obtain PRNU noise with higher quality.

Description

Image PRNU noise purification method and system based on sample mismatch training

Technical Field

The invention relates to the field of image processing, in particular to a method and a system for purifying image PRNU (vertical Fourier transform unit) noise based on sample mismatch training.

Background

Currently, the number of videos and images generated and shared daily through social media is rapidly increasing, mainly benefiting from convenient audiovisual recording technology and mobile internet technology. The technologies also provide convenience for lawless persons to shoot and spread images and videos containing illegal information. However, in the field of digital image forensics, it is necessary to solve the problems of how to identify the authenticity and integrity of a digital image, how to determine an image imaging device, i.e., a shooting camera for judging an image, and the like.

In recent years, scholars at home and abroad have conducted a great deal of intensive research on digital image source evidence obtaining technology and obtained certain research results, and currently, the digital image source evidence obtaining technology mainly comprises active evidence obtaining technology and passive evidence obtaining technology. Active forensics requires adding identification information to a digital image and then performing active identification. The main principle of the passive evidence obtaining technology is that the imaging equipment is influenced by software (such as denoising, enhancing, compression processing and the like) inside the equipment, hardware (such as digital camera/video recorder sensor characteristics, CFA structures and the like) and natural scenes (such as strong correlation between adjacent image pixels/image frames and the like), and original digital images which are not processed at all often have certain inherent statistical rules which are different due to different imaging equipment. Therefore, it can be used as a natural 'watermark' information of digital images for evidence collection work of digital image sources and the like. The active forensics technology is easy to change because information such as digital watermark needs to be added to the image in advance, and has great limitation when the information such as digital watermark is not added in advance. Therefore, passive evidence obtaining technology is a future development direction and research focus of digital image source evidence obtaining technology. Among them, a passive evidence-taking technique based on digital image imaging Sensor Pattern Noise (SPN) has been successful, the principle is that the image Sensor inevitably generates a Sensor Pattern Noise during imaging, the SPN is mainly composed of Photo-Response Non-Uniformity (PRNU) Noise, and the PRNU Noise is caused by defects of the image Sensor manufacturing process and Non-Uniformity of the silicon wafer, so that the PRNU Noise is unique for each imaging device and is not affected by the external environment, and thus the Sensor Noise is very suitable to be used as the inherent "fingerprint" of the imaging device for digital image evidence-taking work.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method and a system for purifying image PRNU noise based on sample mismatch training, which are used for further purifying the original image imaging sensor noise to obtain PRNU noise with higher quality.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

an image PRNU noise purification method based on sample mismatch training comprises the following steps:

s100, acquiring a plurality of noise images of the same digital image sensor to form a training set of a depth stack self-encoder;

s200, training to obtain a depth stacking self-encoder model based on a training set of the depth stacking self-encoder, a training technology of sample mismatching and an artificial neural network;

s300, acquiring a noise image of a certain digital image sensor as a PRNU noise image to be purified, inputting the noise image into the depth stack self-encoder model for purification, and obtaining a purified high-quality PRNU noise image.

Further, in the method for purifying the PRNU noise based on the sample mismatch training, step S100 includes:

s101, carrying out noise reduction processing on a plurality of images shot by the same digital image sensor, and calculating to obtain noise residual errors of the plurality of images;

and S102, calculating to obtain a maximum likelihood estimation value of noise of each image as a corresponding noise image based on the noise residuals of the plurality of images and a maximum likelihood estimation method.

Further, in the image PRNU noise purification method based on sample mismatch training as described above, in step S101, the noise residuals of the plurality of images are calculated by the following formula:

Wⁱ＝Iⁱ-F(Iⁱ)(i＝1,…,N)

wherein, IⁱFor the ith image, F (I)ⁱ) For noise reduction of the ith image, WⁱAnd i and N are positive integers of the noise residual of the ith image.

Further, in the method for purifying PRNU noise based on sample mismatch training as described above, in step S101, the maximum likelihood estimation value of the noise K of the digital image sensor is calculated by the following formula

Wherein, IⁱFor the ith image, WⁱAnd i and N are positive integers of the noise residual of the ith image.

Further, in the method for purifying PRNU noise based on sample mismatch training as described above, in step S200, the depth-stacked self-encoder model includes a plurality of modules, each module includes two self-encoders with the same network structure, a single self-encoder includes three neural networks of an input layer, a hidden layer and an output layer, a training set of the depth-stacked self-encoder is used as an input of a first self-encoder in a first module,

the training process of the first module is as follows:

the input of a first self-encoder in the first module is consistent with the expected output, unsupervised learning is carried out through minimizing reconstruction errors, after training is finished, the real output of the first self-encoder in the first module is extracted as the input of a second self-encoder in the first module, the original input of the first self-encoder in the first module is disordered to form the expected output of the second self-encoder in the first module, unsupervised learning is carried out through minimizing reconstruction errors, after training is finished, the real output of the second self-encoder in the first module is obtained, and the training of the first module is finished;

the training process of the second module is as follows:

extracting a hidden layer output of a first self-encoder in the first module as an input of the first self-encoder in the second module, performing unsupervised learning by minimizing a reconstruction error, after the training is completed, extracting a real output of the first self-encoder in the second module as an input of a second self-encoder in the second module, extracting a hidden layer output of the second self-encoder in the first module as an expected output of the second self-encoder in the second module, and performing unsupervised learning by minimizing the reconstruction error to complete the training of the second module;

and sequentially training each module in the plurality of modules according to the training process of the first module and the second module, after the training is finished, extracting a hidden layer in each module as an encoder part of the deep stacking self-encoder model, extracting an output layer in each module as a decoder part of the deep stacking self-encoder model, inputting the training set of the deep stacking self-encoder into the deep stacking self-encoder model again, and performing unsupervised learning by minimizing a reconstruction error to finish the fine tuning process of the deep stacking self-encoder model.

Further, the method for image PRNU noise purification based on sample mismatch training as described above, further comprising:

s400, acquiring a PRNU noise image to be purified of a certain digital image sensor based on the method of S100, obtaining a high-quality PRNU noise image through the acquired PRNU noise image by the method of S300, respectively calculating peak correlation energy ratios of the PRNU noise image to be purified and the high-quality PRNU noise image, comparing the two peak correlation energy ratios, and verifying the purification effect of the depth stack self-encoder model.

Further, in the method for purifying image PRNU noise based on sample mismatch training as described above, in step S400, the peak correlation energy ratio PCE value of the to-be-purified PRNU noise image or the high-quality PRNU noise image is calculated by the following formula:

wherein the content of the first and second substances,

y is W, X and Y have dimensions m × n, I is a real image captured by a digital image sensor,

w is the noise residual of the real image for the PRNU noise image to be refined or the high quality PRNU noise image, p (s; X, Y) is a correlation function between X, Y, p(s)_peak(ii) a X, Y) is the maximum value of the correlation function, sign (ρ(s)_peak(ii) a X, Y)) is the sign of the function at which the maximum is taken, N is given by s_peakA small area in the center.

An image PRNU noise purification system based on sample mismatch training, comprising:

the acquisition module is used for acquiring a plurality of noise images of the same digital image sensor to form a training set of the depth stacking self-encoder;

the training module is used for training to obtain a deep stacking self-encoder model based on a training set of the deep stacking self-encoder, a sample mismatching training technology and an artificial neural network;

and the purification module is used for acquiring a noise image of a certain digital image sensor as a PRNU noise image to be purified, inputting the PRNU noise image to be purified into the depth stack self-encoder model for purification, and obtaining a purified high-quality PRNU noise image.

Further, the system for image PRNU noise refinement based on sample mismatch training as described above, the acquisition module is configured to:

carrying out noise reduction processing on a plurality of images shot by the same digital image sensor, and calculating to obtain noise residual errors of the plurality of images;

and calculating to obtain a maximum likelihood estimation value of the noise of each image as a corresponding noise image based on the noise residuals of the plurality of images and a maximum likelihood estimation method.

Further, an image PRNU noise purification system based on sample mismatch training as described above, the system further comprising:

and the verification module is used for acquiring a PRNU noise image to be purified of a certain digital image sensor based on the acquisition module, obtaining a high-quality PRNU noise image through the purification module, respectively calculating peak correlation energy ratios of the PRNU noise image to be purified and the high-quality PRNU noise image, comparing the two peak correlation energy ratios, and verifying the purification effect of the depth stack self-encoder model.

The invention has the beneficial effects that: according to the method and the system provided by the invention, the artificial neural network is adopted to carry out the purification training of the noise of the digital image imaging sensor, a sample mismatching strategy is adopted during the training, and finally the artificial neural network is trained into the purifier of the noise of the digital image imaging sensor, so that the further purification is realized on the basis of the noise of the original image imaging sensor, and the PRNU noise with higher quality is obtained.

Drawings

Fig. 1 is a schematic flowchart of an image PRNU noise purification method based on sample mismatch training according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating depth stacked self-coder model pre-training provided in an embodiment of the invention;

FIG. 3 is a diagram illustrating depth stacking self-coder model tuning provided in an embodiment of the invention;

fig. 4 is a schematic block diagram of an image PRNU noise purification method based on sample mismatch training according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an image PRNU noise purification system based on sample mismatch training according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and the detailed description.

As shown in fig. 1, a method for purifying image PRNU noise based on sample mismatch training includes:

the step S100 includes:

specifically, the noise residuals of the plurality of images may be calculated by:

Wⁱ＝Iⁱ-F(Iⁱ)

wherein, IⁱFor the ith image, F (I)ⁱ) For noise reduction of the ith image, WⁱI is 1, …, and N, i and N are positive integers.

S102, calculating to obtain a maximum likelihood estimation value of noise of each image as a corresponding noise image based on the noise residual errors of the plurality of images and a maximum likelihood estimation method;

specifically, the maximum likelihood estimation value of the noise K of the digital image sensor can be calculated by the following formula

Wherein, IⁱFor the ith image, WⁱI is 1, …, and N, i and N are positive integers.

the depth stacking self-encoder model comprises a plurality of modules, each module comprises two self-encoders with the same network structure, a single self-encoder comprises three layers of neural networks of an input layer, a hidden layer and an output layer, a training set of the depth stacking self-encoder is used as the input of a first self-encoder in a first module,

the training process for the first module is as follows:

the input of a first self-encoder in a first module is consistent with the expected output, unsupervised learning is carried out through the minimized reconstruction error, after the training is finished, the real output of the first self-encoder in the first module is extracted to be used as the input of a second self-encoder in the first module, the original input of the first self-encoder in the first module is disordered to form the expected output of the second self-encoder in the first module, unsupervised learning is carried out through the minimized reconstruction error, after the training is finished, the real output of the second self-encoder in the first module is obtained, and the training of the first module is finished;

the training process for the second module is as follows:

extracting the hidden layer output of a first self-encoder in a first module as the input of the first self-encoder in a second module, performing unsupervised learning by minimizing a reconstruction error, after the training is completed, extracting the real output of the first self-encoder in the second module as the input of a second self-encoder in the second module, extracting the hidden layer output of the second self-encoder in the first module as the expected output of the second self-encoder in the second module, performing unsupervised learning by minimizing the reconstruction error, and completing the training of the second module;

and training each module in the plurality of modules in sequence according to the training process of the first module and the second module, after the training is finished, extracting a hidden layer in each module to be used as an encoder part of the depth stacking self-encoder model, extracting an output layer in each module to be used as a decoder part of the depth stacking self-encoder model, inputting the training set of the depth stacking self-encoder into the depth stacking self-encoder model again, and performing unsupervised learning by minimizing a reconstruction error to finish the fine tuning process of the depth stacking self-encoder model.

The training technology based on sample mismatch refers to that two models of basic self-encoders are used as a module, each training model in the module is independent, but the whole training process is related after the modules are stacked in sequence.

Fig. 2 is a schematic diagram of pre-training of a deep-stacked self-coder model, taking training of two modules as an example, where fig. 2(a) is a schematic diagram of training of a first module and fig. 2(b) is a schematic diagram of training of a second module. As shown in the left half of fig. 2(a), the single autoencoder model is composed of three neural networks, i.e., an input layer, a hidden layer, and an output layer, and the structure of the model can be divided into two parts, i.e., an Encoder (Encoder) and a Decoder (Decoder). The training process comprises the following steps: x is the number of⁽ⁿ⁾Representing the input of a first self-encoder, through the encoder part resulting in an output X of a hidden layer⁽ⁿ⁾Wherein f is_eRepresenting the processing of the encoder:

X＝f(x)＝σ_e(W_e×x+b_e)

where x represents the input from the encoder, σ_eAn activation function (non-linearity), W, representing the hidden layer_eAnd b_eRepresenting the weights and bias parameters connecting the encoder input layer and the hidden layer, respectively, X represents the output of the hidden layer. For encoder processing of jth self-encoder in ith module

I is 1, …, N; j is 1, 2.

Then X⁽ⁿ⁾After passing through the decoder section, the true output from the encoder is obtained

Wherein g is_dRepresenting the processing of the decoder:

where X denotes the input to the decoder, i.e. the output of the hidden layer in the encoder, σ_dRepresenting the activation function (non-linearity), W, of the output layer_dAnd b_dRespectively representing weights and bias parameters connecting the hidden layer and the output layer,

representing the output of the decoder. For the decoder processing of the jth self-encoder in the ith module

I is 1, …, N; j is 1, 2.

The goal of the self-encoder training is to minimize the reconstruction error between the encoder input and the decoder output, and its loss function can be expressed as:

where n is the number of training samples (training set of the deep stacked self-encoder), x⁽ⁿ⁾For the nth input sample, the number of samples,

for the n-th output sample,

is the mean square error. During the training process of the self-encoder, by optimizing the above loss function, the self-encoder can represent the input X as the output X of the hidden layer, and then can reconstruct X as the real output similar to the input X

The training process of the depth stacking self-encoder model pre-training two modules is as follows: first, the training of the first module is carried out, the first one isThe input of the encoder (the training set of the depth-stacked self-encoder formed in step S100 is the input of the first self-encoder) is x⁽ⁿ⁾The desired output is also x⁽ⁿ⁾The two are kept in agreement, by minimizing reconstruction errors

Performing unsupervised learning, and extracting real output of the first self-encoder after training

As input to the second self-encoder, and the original input (i.e., the input to the first self-encoder, the original input and the previous input being one meaning.) x⁽ⁿ⁾Disorderly in sequence to form the desired output x of the second self-encoder^(m)By minimizing reconstruction errors

Performing unsupervised learning to obtain true output of the second self-encoder

Namely, it is

As shown in fig. 2 (a).

Next, the second module is trained by first taking the hidden layer output (i.e., X) of the first self-encoder in the first module⁽ⁿ⁾) As input to the first self-encoder in the second block by minimizing reconstruction errors

Performing unsupervised learning with the encoder, and extracting the real output of the encoder after training

As input to the second self-encoder in the module, the hidden layer of the second self-encoder in the first module is taken outOutput (i.e. X)^(m)) As the desired output of the second self-encoder in the second module, by minimizing the reconstruction error

Unsupervised learning is performed to complete the training of the second module, as shown in fig. 2 (b). It should be noted that the single self-encoder model is composed of three neural networks, namely an input layer, a hidden layer and an output layer, the expected output is the output expected to be achieved by the self-encoder under an ideal condition, the real output is the output of the self-encoder after actual training, errors always exist between the actual output and the real output, namely the output of the output layer, and the output of the hidden layer, namely the output of the hidden layer, and the actual output and the hidden output are related.

FIG. 3 is a schematic diagram of depth-stacked self-coder model training trimming. After finishing training the ith module in the pre-training of the model of the depth stack self-encoder, wherein i is 1, …, N, the hidden layer and the output layer in a single self-encoder need to be taken to form the model of the depth stack self-encoder, and the fine tuning process of the model is finished by training again. Taking the two blocks of fig. 2 as an example, the hidden layer of the first self-encoder in the first block and the hidden layer of the second block are taken in sequence (i.e. the hidden layer is taken as the first self-encoder in the second block

) The encoder portion of the depth-stacked self-encoder model is assembled, and then the output layers of the second self-encoder in the first and second modules are taken in sequence (i.e., the output layers of the encoder are taken

) And (3) forming a decoder part of the depth stacking self-encoder model, namely sequentially stacking output layers of the single self-encoder trained previously to form the decoder part of the depth stacking self-encoder. Input the original x again⁽ⁿ⁾Input into a deep stacked self-encoder network by minimizing reconstruction errors

And training to finish the fine adjustment process of the depth stacking self-encoder model.

S300, acquiring a noise image of a digital image sensor as a PRNU noise image to be purified, inputting the PRNU noise image to be purified into a depth stack self-encoder model for purification, and obtaining a purified high-quality PRNU noise image.

A plurality of real images are obtained by shooting with a digital image sensor, the plurality of real images are processed to obtain a plurality of noise images as a PRNU noise image set to be purified, a test sample set is formed, and the test sample set is input into a purification model (i.e., a depth stacked self-coder model), so that a test sample set formed by purified high-quality PRNU noise is obtained, as shown in fig. 4. Specifically, a high quality PRNU noise image is obtained by:

wherein the content of the first and second substances,

for the PRNU noise image to be refined,

is a high quality PRNU noise image.

After the purified high-quality PRNU noise is obtained, the PRNU noise image to be purified and the high-quality PRNU noise can be verified, and the purification effect of the depth stack self-encoder model can be verified. Specifically, the verification is performed by:

s400, acquiring a PRNU noise image to be purified of a certain digital image sensor based on the method of S100, obtaining a high-quality PRNU noise image through the acquired PRNU noise image in S300, respectively calculating peak correlation energy ratios of the PRNU noise image to be purified and the high-quality PRNU noise image, comparing the two peak correlation energy ratios, and verifying the purification effect of the depth stack self-encoder model.

And during verification, the true image of the selected PRNU noise image to be purified is sourced from a corresponding digital image sensor, so that the peak correlation energy ratios are all larger than a preset threshold value. The method comprises the steps of obtaining a high-quality PRNU noise image after a PRNU noise image to be purified is purified through a purification module, then respectively calculating the peak value correlation energy ratio of the PRNU noise image to be purified and the high-quality PRNU noise image to be purified, comparing the two peak value correlation energy ratios, and if the peak value correlation energy ratio is larger, indicating that the probability that the peak value correlation energy ratio is from a corresponding digital image sensor is higher, so that the fact that the depth stacking self-encoder model effectively removes irrelevant noise in the PRNU noise image to be purified is verified, and the high-quality PRNU noise is obtained.

Specifically, the peak correlation energy ratio PCE value of the PRNU noise image to be purified or the high-quality PRNU noise image may be calculated by the following formula:

wherein the content of the first and second substances,

for the PRNU noise image to be refined or a high quality PRNU noise image, W is the noise residual of the real image, ρ (s; X, Y) is the correlation function between X, Y, ρ(s)_peak(ii) a X, Y) is the maximum value of the correlation function, sign (ρ(s)_peak(ii) a X, Y)) is the sign of the function at which the maximum is taken, N is given by s_peakA small block area (size range: 11 × 11 pixels) at the center.

According to the method, a peak to correlation energy ratio (PCE) value is used as a basis for checking whether an image to be detected is from a specific digital image sensor, after the PCE value is calculated, threshold judgment is carried out on the value, and if the value is larger than or equal to the threshold, the image to be detected is considered to be from the digital image sensor with the PRNU noise. A large number of experiments show that the PCE threshold value is set to be 60, so that whether the image to be detected is from a specific camera or not can be well judged. In order to verify the effectiveness of the invention, 74 × 50 images shot by 74 cameras are used for testing, 50 images in one camera are used as the same type samples, the rest 73 cameras verify 3650 images in a concentrated mode as the heterogeneous samples, finally, PCE values of 3700 same type samples and PCE values of 270100 heterogeneous samples are obtained on all the cameras, the PCE values are judged under different thresholds, the true positive rate and the false positive rate can be obtained, and a ROC curve is drawn. The test results show that camera sensor noise further removes extraneous noise when passing through the compression-decompression process of the depth-stacked self-encoder model, resulting in a refined PRNU noise. Therefore, the method provided by the invention has a function of purifying the noise of the digital image imaging sensor.

As shown in fig. 5, an image PRNU noise purification system based on sample mismatch training includes:

an acquisition module 100, configured to acquire multiple noise images of the same digital image sensor, and form a training set of a depth stack self-encoder;

the training module 200 is used for training to obtain a deep stacking self-encoder model based on a training set of the deep stacking self-encoder, a training technology of sample mismatching and an artificial neural network;

and the purification module 300 is configured to acquire a noise image of a digital image sensor as a PRNU noise image to be purified, and input the PRNU noise image to be purified into a depth stack self-encoder model for purification, so as to obtain a purified high-quality PRNU noise image.

The obtaining module 100 is configured to:

The system further comprises:

the verification module 400 obtains a PRNU noise image to be purified of a certain digital image sensor based on the obtaining module 100, obtains a high-quality PRNU noise image through the purification module 300, respectively calculates a peak correlation energy ratio of the PRNU noise image to be purified and the high-quality PRNU noise image, compares the two peak correlation energy ratios, and verifies a purification effect of the depth stack self-encoder model.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is intended to include such modifications and variations.

Claims

1. An image PRNU noise purification method based on sample mismatch training is characterized by comprising the following steps:

s300, acquiring a noise image of a digital image sensor as a PRNU noise image to be purified, inputting the PRNU noise image to be purified into the depth stack self-encoder model for purification, and obtaining a purified high-quality PRNU noise image.

2. The method for image PRNU noise refinement based on sample mismatch training according to claim 1, wherein step S100 comprises:

3. The method according to claim 2, wherein in step S101, the noise residuals of the plurality of images are calculated by:

Wⁱ＝Iⁱ-F(Iⁱ)(i＝1,…,N)

4. The method for image PRNU noise refinement based on sample mismatch training as claimed in claim 2, wherein in step S101, the maximum likelihood estimation value of the noise K of the digital image sensor is calculated by the following formula

5. The method for PRNU noise refinement on the basis of sample mismatch training as claimed in claim 1, wherein in step S200, the depth-stacked self-encoder model comprises a plurality of modules, each module comprises two self-encoders with the same network structure, a single self-encoder comprises three layers of neural networks of an input layer, a hidden layer and an output layer, the training set of the depth-stacked self-encoder is used as the input of the first self-encoder in the first module,

the training process of the first module is as follows:

the training process of the second module is as follows:

6. The method of sample mismatch training-based image PRNU noise purification according to claim 1, further comprising:

7. The method of claim 6, wherein in step S400, the PCU value is calculated as the peak correlation energy ratio of the PRNU noise image to be refined or the high quality PRNU noise image by the following equation:

wherein the content of the first and second substances,

8. An image PRNU noise purification system based on sample mismatch training, comprising:

9. The sample mismatch training-based image PRNU noise purification system of claim 8, wherein the acquisition module is to:

10. The sample mismatch training-based image PRNU noise purification system of claim 8, further comprising: