CN111462085B

CN111462085B - Digital image local filtering evidence obtaining method based on convolutional neural network

Info

Publication number: CN111462085B
Application number: CN202010246245.8A
Authority: CN
Inventors: 冯国瑞; 李雪梅
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2023-09-19
Anticipated expiration: 2040-03-31
Also published as: CN111462085A

Abstract

The invention relates to a digital image local filtering evidence obtaining method based on a convolutional neural network. The method comprises the following operation steps: 1. firstly, cutting the image concentrated image, and respectively performing five filtering operations on the cut small-size image; 2. dividing six types of pictures including original pictures into a training set and a testing set; 3. dividing a training set and a testing set of a VGG network for training a filter; 4. constructing a VGG network to train eight pre-filters; 5. constructing a specific dense network with a pre-filter; 6. inputting training set data of six types of pictures into a dense network to perform neural network training; 7. the output of the trained neural network to the test set is used as a small-size picture classification result, and the classification result of the small-size picture is integrated as an original picture final detection result during actual detection; the invention can effectively and conveniently solve the problem of detecting the local filtering image.

Description

Digital image local filtering evidence obtaining method based on convolutional neural network

Technical Field

The invention relates to a digital image local filtering evidence obtaining method based on a convolutional neural network, and belongs to the technical field of blind evidence obtaining.

Background

With the popularization of image editing software, some illegal persons can easily use the software to maliciously tamper with pictures to achieve the purpose of incapacity of people, which not only causes huge economic and manpower loss, but also causes social panic. Whereas the filtering operation is an operation that is often used in image tampering, blind detection for the filtering operation is necessary. The identification of whether an image has undergone a filtering operation may provide an important criterion for determining whether the image has been tampered with. The traditional filtering detection technology mainly provides a proper characteristic calculation method based on a distribution rule on an image frequency domain, and uses a support vector machine for classification. These methods have excellent performance in discrimination of median filtering, mean filtering and other operations, however, specific eigenvalue calculation methods are often effective only for specific filters of some kinds, and perform abnormally poorly in other filtering operations. The CNN (convolutional neural network) is characterized in that the neural network can be utilized to automatically extract the characteristics, manual participation is not needed, and the CNN is just suitable for the problem. In many excellent network structures, denseNet absorbs the most essential part of ResNet, and innovations are made, so that the network performance is further improved. In addition, in actual situations, more tamperers only locally tampers the image, so that the filtering detection for small-size images puts higher demands on the performance of the detection method.

Blind evidence collection is to distinguish a natural image from a tampered image only according to the acquired digital image content or file header information. At present, many detection methods are premised on that a tamperer performs the same processing on all parts of an image, and in actual situations, many tampered images are only partially processed, so that the partially tampered images cannot be accurately detected. Here we propose: filters with different scales are added to the front end of the DenseNet, initial values of the filters are obtained through network training heterogeneous with the DenseNet, and the filters participate in subsequent training of the DenseNet. The method can supplement the high-dimensional features which are not learned by the independent DenseNet, improves the accuracy of classification after the subsequent DenseNet extracts the features, and ensures that the detection performance of the small-size image meets the requirement. Compared with the traditional scheme, the new scheme has wider application range and higher accuracy.

Disclosure of Invention

Aiming at the characteristics that the existing detection method needs to utilize a priori model to extract characteristics and has poor performance on small-size pictures, the invention aims to provide a digital image local filtering evidence obtaining method based on a convolutional neural network, which can still have good robustness on the pictures with very small sizes.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a digital image local filtering evidence obtaining method based on convolutional neural network comprises the following specific operation steps:

(1) Firstly, cutting out an original image concentrated image, and respectively performing five filtering operations on the cut small-size image;

(2) Dividing six types of pictures including original pictures into a training set and a testing set of the main body network;

(3) Dividing a training set and a testing set of the VGG network for training the filter;

(4) Constructing a VGG network to train eight pre-filters;

(5) Constructing a specific dense network with a pre-filter;

(6) Inputting training set data of six types of pictures into a dense network for training a neural network;

(7) And taking the output of the trained neural network to the test set as a small-size picture classification result, and integrating the classification result of the small-size picture as an original picture final detection result during actual detection.

In the step (1):

1-1) cutting each image into small-size pictures with the size of 32 x 32, wherein the size of an original image set is 384 x 512;

1-2) five filtering operations performed on small-size images are respectively: median filtering, mean filtering, gaussian filtering, laplacian filtering, and non-sharpening filtering.

In the step (2), the six types of pictures including the original pictures are divided into a training set and a testing set: the data set D consisting of six types of pictures, including the original, is randomly divided into two mutually exclusive sets, one set as training set S (accounting for 92% of the total number of pictures), the other set as test set T (accounting for 8% of the total number of pictures), i.e. d=s u T,after training the model at S, the accuracy of the model at T is estimated. We randomly pick 90% of the images from S as training set and the rest of S as validation set to estimate the performance of the model.

The training set and the testing set of the VGG network for training the filter are divided in the step (3), and the method is different from that in the step (2): and (3) randomly extracting 25% of four types of pictures except the original picture and the Gaussian filter to form a new type of picture, and dividing a training set S 'and a test set T' by the same method as in the step (2) aiming at the three types of pictures. After training the model at S ', the accuracy of the model is estimated at T'. The idea of the design is as follows: without the pre-filter, the classification accuracy of the subsequent network has reached a very high level for classes other than artwork and gaussian filtering, the main room for improvement being on both classes, so the training of the pre-filter has to focus on between artwork, gaussian filtering and other classes.

The step (4) constructs VGG network to train eight pre-filters: some minor changes are made to the original VGG network structure to accommodate our 32 x 32 size input image, the modified VGG network basic structure is 13 convolutional layers, 5 pooling layers and 3 fully connected layers. The pre-filter is positioned at the first layer in the subsequent dense network, has great influence on the structure behind the network, and has great difference in the result, thereby improving the network performance. The subtle differences of the pre-filters are decisive for the accuracy of the network. The eight pre-filters are in the first layer of the network when they are trained. The reason for choosing such a filter is: the filters with different scales can play different roles in extracting the characteristics of the pictures, the commonly used filters are generally 3*3 or 5*5, and the filters with the sizes of 1*4, 2*3 and the like selected by people increase the variability of the whole network, so that the performance of the network can be further expanded.

Said step (5) builds a specific pre-filter dense network: the specific dense network basic structure is a filtering layer, a convolution layer, 4 dense blocks, 5 pooling layers and three full connection layers. Wherein the basic structure of the dense block is a convolution layer, a BN layer and a ReLU layer. All convolution layers in the dense block are BN-ReLU-Conv structures.

The step (6) is to input training set data of six types of pictures into a dense network for training a neural network: inputting the training set data S divided in the step (2) into a dense network, and training and iterating until the result is converged.

The step (7) is to use the output of the trained neural network to the test set as a small-size picture classification result, and integrate the classification result of the small-size picture as an original picture final detection result during actual detection: inputting the test set divided in the step (2) into a trained neural network, and taking the output result as a small-size picture classification result; when one picture needs to be detected, the picture is cut, a neural network is input, an output result of each small-size image is obtained, and finally the classification results of the small-size images are integrated to obtain the classification result of the original image.

Compared with the prior art, the invention has the following advantages:

the method follows the idea of transfer learning, adds the pre-filters with different scales trained by heterogeneous networks at the front end of the dense network, and improves the detection accuracy of the basic network. The neural network can output small-size image classification results only by dividing the image to be detected into small-size images and inputting the small-size images into the neural network, and the small-size image detection results are integrated to obtain whether the original image is a filtered image or not and undergo filtering operation. The invention has wider application range and higher accuracy.

Drawings

FIG. 1 is a block diagram of the operation of a digital image local filtering forensics method based on convolutional neural networks.

Fig. 2 is a diagram of a modified VGG network configuration.

Fig. 3 initial filter size and set point.

Fig. 4 is a diagram of a dense network architecture.

Fig. 5 is a diagram of a compact block structure.

Fig. 6 shows a convolutional layer structure in a dense block.

Detailed Description

Specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

VGG network: VGG uses 3 3*3 convolution kernels instead of 7*7 convolution kernels and 2 3*3 convolution kernels instead of 5*5 convolution kernels compared to AlexNet, which results in improved network depth with the same perceived field. VGG demonstrates that increasing the depth of the network can improve network performance to some extent.

Dense network: the dense network mainly consists of dense blocks, which are characterized in that all layers are connected, and the input of each layer is obtained by combining the outputs of the previous layers, so that the output information of each layer can be utilized to the greatest extent. Doing so may encourage feature reuse, enhance feature propagation, and may alleviate the gradient vanishing problem. Dense network training is easier and network depth is greater.

As shown in fig. 1, a digital image local filtering evidence obtaining method based on a convolutional neural network comprises the following specific operation steps:

(4) Constructing a VGG network to train eight pre-filters;

(5) Constructing a specific dense network with a pre-filter;

The image in the step (1) is subjected to clipping and filtering operations: we select a set of uci images, 1338 images in total. The size of each picture is 384 x 512, 196 small-size images with the size of 32 x 32 are obtained after each picture is cut, and the number of all small-size images is 262248. The pictures are respectively subjected to median filtering, mean filtering, gaussian filtering, laplacian filtering and non-sharpening filtering by adopting Matlab written codes. Thus, six types of 262248 pictures are obtained by adding the original pictures.

In the step (2), the six types of pictures including the original pictures are divided into a training set and a testing set: for each type of picture we randomly selected 8% using Python language programming and all selected pictures are grouped together as a test set T. The remaining 92% of the pictures are grouped together as training set S. In training set S, we randomly selected 10% as the validation set in the training process.

The step (3) is to divide a training set and a testing set of the VGG network for training the filter: we randomly selected 25% of the four classes of pictures except the original image and gaussian filtering respectively to be combined into a new class by using Python language programming. For these three classes, 8% was randomly selected and all selected pictures were grouped together as a test set T'. The remaining 92% of the pictures are grouped together as a training set S'. In training set S', we randomly selected 10% as the validation set during training.

The step (4) constructs VGG network to train eight pre-filters: some minor changes are made to the original VGG network structure to accommodate the 32 x 32 size of the input image, the modified VGG network structure is shown in fig. 2; the size and initial values of the eight pre-filters are shown in fig. 3. The filter is in the first layer of the overall network when trained. We use Python as programming language and tensorflow as machine learning library to write the relevant code of network structure. Inputting the training set S' divided in the step (3) into a network, training and iterating until the result converges, and obtaining a trained filter.

Said step (5) builds a specific pre-filter dense network: a specific dense network structure is shown in fig. 4; the structure of the dense block is shown in fig. 5; the convolutional layer structure in the dense block is shown in fig. 6. We use Python as programming language and tensorflow as machine learning library to write the relevant code of network structure.

The step (6) is to input training set data of six types of pictures into a dense network for training a neural network: inputting the training set S divided in the step (2) into a dense network, training and iterating until the result converges, and obtaining a trained network.

The step (7) is to use the output of the trained neural network to the test set as a small-size picture classification result, and integrate the classification result of the small-size picture as an original picture final detection result during actual detection: inputting the test set T divided in the step (2) into a trained neural network, and taking the output result as a small-size picture classification result. When one picture needs to be detected, the picture is cut, a trained neural network is input, an output result of each small-size image is obtained, and finally the classification results of the small-size images are integrated to obtain the classification result of the original image. On the UCID data set, the classification accuracy of the cut small-size image is 97.05%.

Claims

1. The digital image local filtering evidence obtaining method based on the convolutional neural network is characterized by comprising the following specific operation steps:

(2) Dividing six types of pictures including original pictures into a training set and a testing set of the convolutional neural network;

(4) Constructing a VGG network to train eight pre-filters;

(5) Constructing a specific dense network with a pre-filter;

(6) Inputting training set data of six types of pictures into a dense network for convolutional neural network training;

(7) The output of the trained convolutional neural network to the test set is used as a small-size picture classification result, and the classification result of the small-size picture is integrated as an original picture final detection result during actual detection;

in the step (2), the data set D composed of six types of pictures including the original picture is randomly divided into two mutually exclusive sets, one set as a training set S, the other set as a test set T, i.e., d=s ∈t,after training the model on S, estimating the accuracy of the model on T; only 90% of images are randomly selected from S to be used as a training set, and the rest images of S are used as a verification set to estimate the performance of the model;

in the step (4), constructing the VGG network to train the eight pre-filters includes:

4-1) the VGG network basic structure is 13 convolution layers, 5 pooling layers and 3 full connection layers;

4-2) the prefilter is in the first layer in the dense network of the subsequent step (5), and when eight prefilters are trained, the prefilter is in the first layer of the convolutional neural network;

the specific dense network basic structure in the step (5) is a filtering layer, a convolution layer, 4 dense blocks, 5 pooling layers and three full connection layers; wherein the basic structure of the dense block is a convolution layer, a BN layer and a ReLU layer.

2. The method for digital image local filtering evidence collection based on convolutional neural network according to claim 1, wherein in the step (1):

3. The method for digital image local filtering evidence obtaining based on convolutional neural network according to claim 1, wherein the training set and test set dividing method in the step (3) is different from the method in the step (2): and (3) randomly extracting 25% of four types of pictures except the original picture and the Gaussian filter to form a new type of picture, and dividing a training set S 'and a test set T' by the same method as in the step (2) aiming at the three types of pictures.

4. The method for digital image local filtering evidence collection based on convolutional neural network according to claim 1, wherein the step (6) inputs the training set data divided in the step (2) into a dense network, and training and iteration are performed until the result converges.

5. The method for digital image local filtering evidence collection based on convolutional neural network according to claim 1, wherein the step (7) inputs the test set divided in the step (2) into a trained convolutional neural network, and the output result is used as a small-size picture classification result; when one picture needs to be detected, cutting the picture, inputting the picture into a convolutional neural network to obtain an output result of each small-size image, and finally integrating the classification result of the small-size image to obtain the classification result of the original image.