CN111462085A

CN111462085A - Digital image local filtering evidence obtaining method based on convolutional neural network

Info

Publication number: CN111462085A
Application number: CN202010246245.8A
Authority: CN
Inventors: 冯国瑞; 李雪梅
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-28
Anticipated expiration: 2040-03-31
Also published as: CN111462085B

Abstract

The invention relates to a digital image local filtering evidence obtaining method based on a convolutional neural network. The method comprises the following operation steps: firstly, cutting an image set image, and respectively carrying out five filtering operations on a cut small-size image; secondly, dividing six types of pictures including the original pictures into a training set and a testing set; thirdly, dividing a training set and a test set of the VGG network of the training filter; fourthly, constructing a VGG network to train eight prefilters; constructing a specific dense network with a prefilter; inputting training set data of six types of pictures into a dense network to carry out neural network training; outputting the trained neural network to the test set as a small-size image classification result, and integrating the classification result of the small-size image as a final detection result of the original image during actual detection; the invention can effectively and conveniently solve the problem of detecting the local filtering image.

Description

Digital image local filtering evidence obtaining method based on convolutional neural network

Technical Field

The invention relates to a digital image local filtering evidence obtaining method based on a convolutional neural network, and belongs to the technical field of blind evidence obtaining.

Background

With the popularization of image editing software, some illegal people can easily utilize the software to maliciously tamper pictures to achieve the purpose of being undeniable, which not only causes huge economic and manpower loss, but also causes social panic. The filtering operation is an operation that is often used in image tampering, and therefore, blind detection for the filtering operation is necessary. Identifying whether an image has undergone a filtering operation may provide an important criterion for determining whether an image has been tampered with. The traditional filtering detection technology mainly provides a proper feature calculation method based on the distribution rule on the image frequency domain, and uses a support vector machine for classification. These methods have very good performance in the discrimination of operations such as median filtering and mean filtering, however, a specific feature value calculation method is often effective only for a specific certain filtering, and the performance is extremely poor in other filtering operations. The CNN (convolutional neural network) has the characteristics that the neural network can be used for automatically extracting features, manual participation is not needed, and the CNN is just suitable for the problem. Among the excellent network structures, DenseNet absorbs the most essential part of ResNet and makes some innovations, so that the network performance is further improved. In addition, in practical situations, more falsifiers only partially falsifie the image, and therefore, higher requirements are placed on the performance of the detection method for filtering detection of small-size images.

Blind forensics is to distinguish natural images from tampered images only according to the acquired digital image content or file header information. At present, the premise of many detection methods is that a falsifier performs the same processing on all parts of an image, and in practical situations, many falsified images are only processed locally, so that the locally falsified images cannot be detected accurately. Here we propose: adding several filters with different scales at the front end of the Densenet, wherein the initial values of the filters are obtained by network training which is heterogeneous to the Densenet, and the filters participate in the subsequent training of the Densenet. By the method, high-dimensional features which are not learned by the independent DenseNet can be supplemented, the accuracy of classification after the features are extracted by the subsequent DenseNet is improved, and the detection performance of the small-size image meets the requirement. Compared with the traditional scheme, the new scheme has wider application range and higher accuracy.

Disclosure of Invention

Aiming at the characteristics that the existing detection method needs to utilize a prior model to extract features and has poor performance on a small-size picture, the invention aims to provide a convolution neural network-based digital image local filtering forensics method which can still have good robustness on a picture with an extremely small size.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a digital image local filtering evidence obtaining method based on a convolutional neural network comprises the following specific operation steps:

(1) firstly, cutting the concentrated image of the original image, and respectively carrying out five filtering operations on the cut small-size image;

(2) dividing six types of pictures including the original pictures into a training set and a testing set of a main body network;

(3) dividing a training set and a test set of a VGG network of the training filter;

(4) constructing a VGG network to train eight prefilters;

(5) constructing a specific dense network with a prefilter;

(6) inputting training set data of the six pictures into a dense network for neural network training;

(7) and outputting the trained neural network to the test set as a small-size image classification result, and integrating the classification result of the small-size image as a final detection result of the original image during actual detection.

In the step (1):

1-1) the original image set image size is 384 × 512, and each image is cut into small-size pictures with the size of 32 × 32;

1-2) five filtering operations performed on the small-size image are: median filtering, mean filtering, gaussian filtering, laplacian filtering, and unsharp filtering.

In the step (2), six types of pictures including the original pictures are divided into a training set and a testing set: a data set D composed of six types of pictures including an original image is randomly divided into two mutually exclusive data setsSets, one of which is set as training set S (92% of the total number of images), and the other is set as test set T (8% of the total number of images), i.e. D ═ S ∪ T,

after training the model at S, the accuracy of the model is estimated at T. We randomly picked only 90% of the images from S as the training set and the rest of the images of S as the validation set to estimate the performance of the model.

The step (3) of dividing the training set and the test set of the VGG network of the training filter is different from the method in the step (2): and (3) respectively randomly extracting 25% of the four types of pictures except the original picture and the Gaussian filter to combine the four types of pictures into a new type of picture, and dividing the three types of pictures into a training set S 'and a testing set T' by the same method as the step (2). After the model is trained on S ', the accuracy of the model is estimated on T'. The idea of the design is as follows: without the pre-filter, the classification accuracy of the subsequent network for the types other than the original image and the gaussian filter reaches a very high level, and the main space for improvement is in the two types, so the training of the pre-filter focuses on the original image, the gaussian filter and other types.

The step (4) constructs a VGG network to train eight pre-filters: we made some slight changes to the structure of the original VGG network to fit our 32 × 32 size input image, and the basic structure of the modified VGG network was 13 convolutional layers, 5 pooling layers and 3 fully-connected layers. The pre-filter is positioned at the first layer in the subsequent dense network, which has great influence on the structure behind the network, and the result has great difference, thereby improving the network performance. The subtle differences in the pre-filter are decisive for the accuracy of the network. Eight prefilters are trained with the filter at the first layer of the network. The reason for choosing such a filter is: the filters with different scales can perform different feature extraction functions on the pictures, the common filters are generally 3 × 3 or 5 × 5, the filters with the sizes of 1 × 4, 2 × 3 and the like are selected to increase the difference of the whole network, and the performance of the network can be further expanded.

And (5) constructing a specific dense network with a prefilter, wherein the basic structures of the specific dense network are a filter layer, a convolutional layer, 4 dense blocks, 5 pooling layers and three full-connection layers, the basic structures of the dense blocks are a convolutional layer, a BN layer and a Re L U layer, and all convolutional layers in the dense blocks are BN-Re L U-Conv structures.

And (6) inputting training set data of six pictures into the dense network for neural network training: and (3) inputting the training set data S divided in the step (2) into a dense network, and performing training and iteration until the result is converged.

And (7) outputting the trained neural network to the test set as a small-size image classification result, and integrating the classification result of the small-size image as a final detection result of the original image during actual detection: inputting the test set divided in the step (2) into a trained neural network, and taking an output result as a small-size picture classification result; when a picture needs to be detected, the picture is cut and input into a neural network to obtain an output result of each small-size image, and finally, the classification results of the small-size images are integrated to obtain a classification result of the original image.

Compared with the prior art, the invention has the following advantages:

the method of the invention follows the idea of transfer learning, adds the prefilters with different scales trained by heterogeneous networks at the front end of the dense network, and improves the detection accuracy of the basic network. The neural network can output the classification result of the small-size images only by dividing the images to be detected into the small-size images and inputting the small-size images into the neural network, and the small-size image detection results are integrated to obtain whether the original images are filtered images or not and which filtering operation is performed. The invention has wider application range and higher accuracy.

Drawings

Fig. 1 is a block diagram of an operation procedure of a partial filtering evidence obtaining method of a digital image based on a convolutional neural network.

Fig. 2 is a modified structure diagram of the VGG network.

Fig. 3 initial filter size and settings.

FIG. 4 is a diagram of a dense network architecture.

Fig. 5 dense block patterning.

FIG. 6 is a diagram of a convolutional layer structure in a dense block.

Detailed Description

The following describes in further detail specific embodiments of the present invention with reference to the accompanying drawings.

VGG network: compared with AlexNet, VGG uses 3 × 3 convolution kernels instead of 7 × 7 convolution kernels and 2 × 3 convolution kernels instead of 5 × 5 convolution kernels, which results in an increase in the depth of the network under the same perceptual field. VGG demonstrates that increasing the depth of the network can improve network performance to some extent.

Dense network: dense networks are mainly composed of dense blocks, which are characterized by connecting all layers together, the input of each layer being combined from the outputs of the first few layers, so that the output information of each layer concerned can be utilized to the maximum possible extent. This may encourage feature reuse, enhance feature propagation, and may alleviate the gradient vanishing problem. Dense network training is easier and the depth of the network is greater.

As shown in fig. 1, a method for obtaining evidence of local filtering of a digital image based on a convolutional neural network includes the following specific operation steps:

(4) constructing a VGG network to train eight prefilters;

(5) constructing a specific dense network with a prefilter;

The image in the step (1) is cut and filtered: we selected a set of UCID images for a total of 1338 images. Each picture was 384 × 512 in size, and each picture was cropped to obtain 196 small-size images of 32 × 32 size, and the number of all the small-size images was 262248. Matlab coding is adopted to carry out median filtering, mean filtering, Gaussian filtering, Laplace filtering and unsharp filtering on the pictures respectively. Thus, 262248 pictures were obtained for each of six types by adding the original image.

In the step (2), six types of pictures including the original pictures are divided into a training set and a testing set: for each type of picture, we randomly chosen 8% using Python language programming and put all the chosen pictures together as a test set T. The remaining 92% of the pictures are grouped together as a training set S. In the training set S, we randomly selected 10% as the validation set in the training process.

The step (3) divides a training set and a test set of the VGG network of the training filter: we use Python language programming to randomly select 25% of the four types of pictures except the original pictures and gaussian filters to be combined into a new type. For these three categories, 8% were randomly selected and all selected pictures were grouped together as test set T'. The remaining 92% of the pictures are grouped together as a training set S'. In the training set S', we randomly selected 10% as the validation set in the training process.

The step (4) constructs a VGG network to train eight pre-filters: we make some slight changes to the structure of the original VGG network to fit into our 32 × 32 sized input image, and the modified VGG network structure is shown in fig. 2; the sizes and initial values of the eight prefilters are shown in figure 3. The filter is at the first layer of the overall network during training. We use Python as the programming language and tensorflow as the machine learning library to write the relevant code for the network structure. And (4) inputting the training set S' divided in the step (3) into a network, and training and iterating until the result is converged to obtain a trained filter.

The step (5) constructs a specific dense network with a prefilter: a specific dense network structure is shown in fig. 4; the structure of the dense block is shown in fig. 5; the convolutional layer structure in the dense block is shown in fig. 6. We use Python as the programming language and tensorflow as the machine learning library to write the relevant code for the network structure.

And (6) inputting training set data of six pictures into the dense network for neural network training: and (3) inputting the training set S divided in the step (2) into a dense network, and performing training and iteration until the result is converged to obtain a trained network.

And (7) outputting the trained neural network to the test set as a small-size image classification result, and integrating the classification result of the small-size image as a final detection result of the original image during actual detection: and (3) inputting the test set T divided in the step (2) into a trained neural network, and taking an output result as a small-size picture classification result. When a picture needs to be detected, the picture is cut, a trained neural network is input, an output result of each small-size image is obtained, and finally the classification results of the small-size images are integrated to obtain the classification result of the original image. On the UCID dataset, the cut out small size image classification accuracy was 97.05%.

Claims

1. A digital image local filtering evidence obtaining method based on a convolutional neural network is characterized by comprising the following specific operation steps:

(4) constructing a VGG network to train eight prefilters;

(5) constructing a specific dense network with a prefilter;

2. The convolutional neural network based digital image local filtering forensics method of claim 1, wherein in the step (1):

3. The convolutional neural network-based digital image local filtering forensics method as claimed in claim 1, wherein the data set D formed by six types of pictures including the original in step (2) is randomly divided into two mutually exclusive sets, one set is training set S, the other set is testing set T, namely D is S ∪ T,

after the model is trained on the S, estimating the accuracy of the model on the T; we randomly picked only 90% of the images from S as the training set and the rest of the images of S as the validation set to estimate the performance of the model.

4. The convolutional neural network-based digital image local filtering forensics method of claim 1, wherein the training set and test set partitioning method in the step (3) is different from the method in the step (2): and (3) respectively randomly extracting 25% of the four types of pictures except the original picture and the Gaussian filter to combine the four types of pictures into a new type of picture, and dividing the three types of pictures into a training set S 'and a testing set T' by the same method as the step (2).

5. The convolutional neural network based digital image local filtering forensics method of claim 1, wherein in the step (4):

4-1) the basic structure of the VGG network comprises 13 convolutional layers, 5 pooling layers and 3 full-connection layers;

4-2) the pre-filter is positioned at the first layer in the subsequent dense network, which has great influence on the structure behind the network, and the result has great difference, thereby improving the network performance; the subtle differences of the pre-filters have a decisive role in the accuracy of the network; eight prefilters are trained with the filter at the first layer of the network.

6. The convolutional neural network-based digital image local filtering forensics method of claim 1, wherein the specific dense network basic structures in the step (5) are a filter layer, a convolutional layer, 4 dense blocks, 5 pooling layers and three full-connection layers, wherein the basic structures of the dense blocks are a convolutional layer, a BN layer and a Re L U layer.

7. The convolutional neural network based digital image local filtering forensics method of claim 1, wherein the step (6) inputs the training set data divided by the step (2) into a dense network, and performs training and iteration until the result converges.

8. The convolution neural network-based digital image local filtering evidence obtaining method according to claim 1, characterized in that the step (7) inputs the test set divided in the step (2) into the trained neural network, and takes the output result as a small-size image classification result; when a picture needs to be detected, the picture is cut and input into a neural network to obtain an output result of each small-size image, and finally, the classification results of the small-size images are integrated to obtain a classification result of the original image.