CN111415323A

CN111415323A - Image detection method and device and neural network training method and device

Info

Publication number: CN111415323A
Application number: CN201910007257.2A
Authority: CN
Inventors: 李斌; 张浩鑫; 罗瑚; 刘永亮; 黄继武
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2020-07-14
Anticipated expiration: 2039-01-04
Also published as: CN111415323B

Abstract

The application discloses an image detection method and device, a neural network model training method and device, a computer storage medium and an electronic device; wherein the image detection method comprises: acquiring an image to be detected; obtaining at least two types of characteristic images based on the processing of the image to be detected; inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results; determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result; therefore, the convolutional neural network model has higher accuracy in image detection and stronger anti-evidence-obtaining resistance.

Description

Image detection method and device and neural network training method and device

Technical Field

The application relates to the field of machine learning, in particular to a method and a device for detecting an image. The application also relates to a training method and device of the neural network, a computer storage medium and an electronic device.

Background

Image compression techniques are widely used because they reduce a large amount of redundant information while maintaining a good visual effect of the image.

However, with the continuous development of the internet, the image as an effective storage and transmission medium of information brings convenience and has a great potential safety hazard. Because the tampered image can be compressed into a new image, the tampered content is not easily recognizable. Generally, an image undergoing tampering will generally undergo a double-compressed image, and therefore, double-compressed image detection is of great significance in image forensics because it can reveal whether a JPEG image in its storage format has been tampered with and possibly locate the tampered area.

Disclosure of Invention

The application provides an image detection method and device, which aim to solve the problem that in the prior art, a detection result is inaccurate. The present application additionally provides a neural network training method and apparatus, as well as a computer storage medium and an electronic device.

The application provides an image detection method, which comprises the following steps:

acquiring an image to be detected;

obtaining at least two types of characteristic images based on the processing of the image to be detected;

inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;

and determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result.

In some embodiments, the obtaining at least two types of feature images based on the processing of the image to be detected includes:

and segmenting the image to be detected to obtain at least two types of characteristic images.

In some embodiments, the segmenting the image to be detected to obtain at least two types of feature images includes:

dividing the obtained pixel matrix of the image to be detected to obtain image blocks;

selecting pixels of a region adjacent to the central position of the image block and pixels of a region adjacent to the segmentation intersection position of the image block;

arranging and combining pixels in an area adjacent to the central position of the image block according to the block dividing sequence of the image to be detected to obtain an intra-block characteristic image;

partitioning the image into blocks, dividing pixels in the adjacent area of the intersection position, and arranging and combining according to the partitioning sequence of the image to be detected to obtain an inter-block characteristic image;

and determining the intra-block characteristic image and the inter-block characteristic image as at least two types of acquired characteristic images.

In some embodiments, the dividing the acquired image to be detected to obtain image blocks includes:

and dividing the image to be detected from left to right and from top to bottom to obtain image blocks.

In some embodiments, inputting information of the at least two types of feature images into a neural network having at least two paths for recognition to obtain recognition classification results, including:

inputting the intra-block feature image into a convolution group with an intra-block path in the neural network, determining an image feature of the intra-block feature image;

inputting the inter-block feature images into convolution groups with inter-block paths in the neural network, and determining image features of the inter-block feature images;

inputting the image features of the intra-block feature images and the image features of the inter-block feature images into a dimension reduction layer at the tail end of an intra-block access and a dimension reduction layer at the tail end of an inter-block access respectively, and determining the main image features of the intra-block feature images and the main image features of the inter-block feature images;

combining the main image features of the intra-block feature images and the main image features of the inter-block feature images to obtain the feature vectors of the images to be detected;

and transmitting the characteristic vectors to a full connection layer of the neural network, and determining the identification and classification result of the image to be detected.

In some embodiments, further comprising:

before determining the image features of the intra-block feature images and the image features of the inter-block feature images, respectively performing filtering processing on the intra-block feature images and the inter-block feature images to obtain filtered intra-block feature images and inter-block feature images.

In some embodiments, said inputting said intra-block feature image into a convolution group with intra-block paths in said neural network, determining image features of said intra-block feature image, comprises:

inputting the intra-block feature image into a convolution group of an intra-block access in the neural network for convolution processing to obtain a processed intra-block feature image;

inputting the processed intra-block characteristic image into a pooling layer for processing to obtain the image characteristics of the intra-block characteristic image;

the inputting the inter-block feature image into a convolution group with an inter-block path in the neural network, and extracting the image feature of the inter-block feature image includes:

inputting the inter-block feature images into a convolution group of an inter-block path in the neural network for convolution processing to obtain processed inter-block feature images;

inputting the processed inter-block feature images into a pooling layer for processing to obtain image features of the inter-block feature images;

wherein the number of convolution groups of the intra-block path is at least two, and the pooling layer of the intra-block path is located between the convolution groups of the intra-block path; the number of the convolution groups of the inter-block paths is at least two, and the pooling layer of the inter-block paths is positioned between the convolution groups of the inter-block paths.

In some embodiments, inputting the image features of the intra-block feature image and the image features of the inter-block feature image to a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, and determining the main image features of the intra-block feature image and the main image features of the inter-block feature image, comprises:

pooling image features of the intra-block feature images in the intra-block passage to obtain pooled intra-block feature images;

performing convolution processing on the pooled intra-block feature images to obtain main image features of the intra-block feature images;

performing convolution processing on the image characteristics of the inter-block characteristic images in the inter-block passage to obtain convolved inter-block characteristic images;

and performing pooling processing on the convolved inter-block feature images to obtain main image features of the inter-block feature images.

In some embodiments, merging the main image features of the intra-block feature image and the main image features of the inter-block feature image to obtain a feature vector of the image to be detected includes:

respectively converting the main image features of the intra-block feature images and the main image features of the inter-block feature images into one-dimensional feature vectors to obtain intra-block one-dimensional feature vectors and inter-block one-dimensional feature vectors;

and combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structural sequence of the corresponding intra-block path and inter-block path respectively to obtain the characteristic vector of the image to be detected.

In some embodiments, the determining whether the acquired image to be detected belongs to a dual compression image according to the recognition and classification result includes:

and comparing the single-compression classification probability value of the identification classification result of the image to be detected with the double-compression classification probability value, and if the double-compression classification probability value is greater than the single-compression classification probability value, determining the acquired image to be detected as a double-compression image.

In some embodiments, further comprising:

carrying out normalization pretreatment on the obtained image to be detected to obtain a normalized image to be detected;

based on the processing of the image to be detected, at least two types of characteristic images are obtained, including:

and segmenting the image to be detected after the normalization processing to obtain at least two types of characteristic images.

In some embodiments, the neural network having at least two paths is a convolutional neural network.

The present application also provides an image detection apparatus, including:

the acquisition unit is used for acquiring an image to be detected;

the processing unit is used for obtaining at least two types of characteristic images based on the processing of the image to be detected;

the classification and identification unit is used for inputting the information of the at least two types of characteristic images into a neural network with at least two paths for identification to obtain an identification and classification result;

and the determining unit is used for determining whether the image to be detected belongs to dual-compression image information or not according to the identification and classification result.

The application also provides a training method of the neural network model, which comprises the following steps:

acquiring a training sample image;

obtaining at least two types of training sample characteristic images based on the processing of the training sample images;

inputting the at least two types of training sample characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;

determining the weight of the neural network according to the recognition and classification result;

and updating the weight of the neural network according to the determined weight of the neural network to obtain a trained neural network model.

In some embodiments, the obtaining at least two types of training sample feature image information based on the processing of the training sample images includes:

and segmenting the training sample image to obtain at least two types of characteristic images.

In some embodiments, the segmenting the training sample image to obtain at least two types of feature images includes:

dividing the pixel matrix of the obtained training sample image to obtain training sample image blocks;

selecting pixels of a region adjacent to the center position of the training sample image blocks and selecting pixels of a region adjacent to the segmentation intersection position of the training sample image blocks;

pixels of an area adjacent to the center of each block of the training sample image are arranged and combined according to the block dividing sequence of the training sample image to obtain a characteristic image of the training sample in the block;

segmenting pixels of adjacent areas at the intersection positions of the training sample images in a blocking manner, and arranging and combining the pixels according to the segmentation sequence of the training sample images in the blocking manner to obtain inter-block training sample characteristic images;

the intra-block training sample feature images and the inter-block training sample feature images are at least two types of training sample feature images obtained.

In some embodiments, the dividing the acquired training sample image to obtain training sample image blocks includes:

and dividing the training sample image from left to right and from top to bottom to obtain training sample image blocks.

In some embodiments, inputting information of the at least two types of training sample feature images into a neural network with at least two paths for recognition, and obtaining a recognition classification result, including:

inputting the intra-block training sample feature images into convolution groups with intra-block paths in the neural network, and determining image features of the intra-block training sample feature images;

inputting the inter-block training sample feature images into a convolution group with inter-block paths in the neural network, and determining image features of the inter-block training sample feature images;

respectively inputting the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images into a dimensionality reduction layer at the tail end of an intra-block access and a dimensionality reduction layer at the tail end of an inter-block access, and determining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images;

combining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images to obtain feature vectors of the training sample images;

and transmitting the feature vectors to a full connection layer of the neural network, and determining the identification classification result of the training sample image.

In some embodiments, further comprising:

before determining the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images, respectively performing filtering processing on the intra-block training sample feature images and the inter-block training sample feature images to obtain filtered intra-block training sample feature images and inter-block training sample feature images.

In some embodiments, the inputting the intra-block training sample feature image into a convolution group with an intra-block path in the neural network, determining the image features of the intra-block training sample feature image, comprises:

inputting the intra-block training sample characteristic image into a convolution group of an intra-block access in the neural network for convolution processing to obtain a processed intra-block training sample characteristic image;

inputting the processed intra-block training sample characteristic image into a pooling layer for processing to obtain the image characteristics of the intra-block training sample characteristic image;

the inputting the inter-block training sample feature images into a convolution group with inter-block paths in the neural network, and extracting the image features of the inter-block training sample feature images includes:

inputting the inter-block training sample characteristic image into a convolution group of an inter-block passage in the neural network for convolution processing to obtain a processed inter-block training sample characteristic image;

inputting the processed inter-block training sample characteristic images into a pooling layer for processing to obtain image characteristics of the inter-block training sample characteristic images;

In some embodiments, the inputting the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images to a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, and the determining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images includes:

pooling image features of intra-block training sample feature images in the intra-block passage to obtain pooled intra-block training sample feature images;

performing convolution processing on the pooled intra-block training sample characteristic images to obtain main image characteristics of the intra-block training sample characteristic images;

performing convolution processing on the image features of the inter-block training sample feature images in the inter-block passage to obtain the convolved inter-block training sample feature images;

and performing pooling treatment on the convolved inter-block training sample characteristic images to obtain main image characteristics of the inter-block training sample characteristic images.

In some embodiments, the merging the main image features of the intra-block training sample feature image and the main image features of the inter-block training sample feature image to obtain the feature vector of the training sample image includes:

respectively converting the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images into one-dimensional feature vectors to obtain intra-block one-dimensional feature vectors and inter-block one-dimensional feature vectors;

and combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structural sequence of the corresponding intra-block path and inter-block path respectively to obtain the characteristic vector of the training sample image.

In some embodiments, further comprising:

carrying out normalization pretreatment on the obtained training sample image to obtain a training sample image after normalization treatment;

the obtaining at least two types of training sample feature images based on the processing of the training sample images comprises:

and segmenting the training sample image after the normalization processing to obtain at least two types of training sample characteristic images.

In some embodiments, the determining the weight of the neural network according to the classification result includes:

calculating according to the classification result as a classification label value and a real label value of the training sample image to obtain a loss value;

determining a loss value as a weight of the neural network.

In some embodiments, the updating the weights of the neural network according to the determined weights of the neural network to obtain the trained neural network model includes:

and updating the old weight of the neural network by taking the determined weight of the neural network as a new weight in a direction propagation mode to obtain the trained neural network model with the double channels.

The present application further provides a training apparatus for a neural network model, including:

an acquisition unit for acquiring a training sample image;

the processing unit is used for obtaining at least two types of training sample characteristic images based on the processing of the training sample images;

the recognition unit is used for inputting the information of the characteristic images of the at least two types of training samples into a neural network with at least two paths for recognition to obtain recognition and classification results;

the determining unit is used for determining the weight of the neural network according to the recognition and classification result;

and the updating unit is used for updating the weight of the neural network according to the determined weight of the neural network to obtain the trained neural network model.

The application also provides a computer storage medium for storing the data generated by the network platform and a program for processing the data generated by the network platform;

when read and executed by the processor, the program performs the following operations:

acquiring an image to be detected;

The present application further provides an electronic device, comprising:

a processor;

a memory for storing a program for processing network platform generated data, the program when read and executed by the processor performing the following operations:

acquiring an image to be detected;

Compared with the prior art, the method has the following advantages:

the method comprises the steps of obtaining at least two types of characteristic images based on processing of an image to be detected, inputting the two types of characteristic images into a neural network with at least two channels respectively for recognition, obtaining recognition and classification results, and determining whether the image to be detected belongs to a double-compression image or not according to the recognition and classification results, so that the pixel characteristics of the two types of characteristic images can be mined, the recognition of the features of JPEG double compression is facilitated, and attack on the convolutional neural network due to hiding or removing of compression traces of the double-compression image by anti-forensics is better resisted.

In addition, at least two types of characteristic images obtained after segmentation are subjected to filtering processing, so that the identification accuracy and the anti-evidence-obtaining capability of the double-compression image are further improved.

As described above, the present application provides a training method for a neural network model, in which at least two types of training sample feature images are obtained based on processing of an obtained training sample image, then the at least two types of training sample feature images are input into a neural network having at least two channels to be identified, an identification classification result is obtained, a weight of the convolutional neural network is determined according to the identification classification result, the weight of the convolutional neural network is updated (i.e., an iterative training process) according to the determined weight of the convolutional neural network, and a trained convolutional neural network model is obtained, so that the convolutional neural network model has high accuracy and strong anti-forensics resistance when detecting an image.

Drawings

FIG. 1 is a schematic diagram of a neural network of the prior art three;

FIG. 2 is a schematic diagram of the structure of a spatial neural network involved in a prior art four-neural network; .

FIG. 3 is a schematic diagram of a convolutional neural network structure after a frequency domain neural network and a spatial domain neural network are combined in a neural network in the fourth prior art;

FIG. 4 is a flowchart of an embodiment of an image detection method provided by the present application;

fig. 5 is a schematic structural diagram of an embodiment of segmenting an image to be detected in an image detection method provided by the present application;

fig. 6 is a schematic view of a visual effect obtained after an image to be detected is segmented in an image detection method provided by the present application;

FIG. 7 is a schematic structural diagram of a convolutional neural network for detecting an image in an image detection method provided in the present application;

FIG. 8 is a schematic structural diagram of a convolution block in a convolution neural network for detecting an image in an image detection method provided by the present application;

fig. 9 is a schematic diagram illustrating a convolution operation performed on image feature data input into a convolution block in an image detection method provided in the present application;

FIG. 10 is a schematic structural diagram of an embodiment of an image detection apparatus provided in the present application;

FIG. 11 is a flow chart of an embodiment of a method for training a convolutional neural model provided herein;

FIG. 12 is a schematic diagram of an embodiment of a convolutional neural network model training apparatus provided in the present application;

FIG. 13 is a diagram of a common scenario for embedding anti-forensic techniques in a dual JPEG compression process;

fig. 14 is a schematic view of the detection effect of fig. 1 applied to an image processed in an anti-forensic manner.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The description used in this application and in the appended claims is for example: "a," "an," "first," and "second," etc., are not necessarily limited to a quantity or order, but rather are used to distinguish one type of information from another.

The detection of the double JPEG compressed image can be carried out by the following methods:

1. double JPEG image compression detection method based on initial character

In the prior art, a data law appearing in nature is proposed by the ford's L aw, that is, in a number generated in nature, the probability of the appearance of the initial (1-9) of the number is reduced along with the increase of the numerical value of the initial itself, so that the detection of the dual JPEG image compression is realized, as shown in the following formula.

Based on this finding, a simple and efficient initial-feature-based (MBFDF) extraction method is proposed to detect double JPEG compression.

The method comprises the following three steps:

1) first, the first 20 AC coefficient sub-bands are selected in the zig-zag order. The sub-band (sub-band) refers to a one-dimensional vector formed by connecting numerical values with the same position code in all blocks of the image in series;

2) then, for each sub-band, extracting initial character features, namely calculating the frequency of the numbers 1-9 as the appearance of the initial character in the sub-band, wherein each sub-band has 9-dimensional sub-features;

3) finally, the 20 sub-features are combined into a 180-dimensional feature vector, and the 180-dimensional feature vector is sent to an F L D (Fisher L inaerdiscriminant) classifier to be trained and tested, so that the detection result of the double JPEG compressed image is obtained.

2. Dual JPEG image compression detection method based on statistical histogram

The method uses a histogram statistical method to detect double JPEG compression detection, is simple and effective, and mainly comprises the following three steps:

1) firstly, extracting the first 9 subbands in a zig-zag sequence, and counting the number of times of occurrence of values 0-15 in each subband, namely x ═ { hi, j (0), hi, j (1), … hi, j (15) }, (i, j) represents a spatial domain coordinate index;

2) then, normalizing the feature x, wherein the frequencies of the statistics values 0-15 appearing in all the 16 values are respectively, namely x' { hi, j (0), hi, j (1), … hi, j (15) }/Ci, j, Ci, j is the total frequency of the 16 values;

3) finally, x' in each sub-band is combined to obtain a 144-dimensional feature vector, and the 144-dimensional feature vector is sent to a support vector machine (SVM: and (4) training and testing in a Support Vector Machine to obtain a detection result of the double JPEG compressed image.

3. Double JPEG image compression detection method based on frequency domain convolutional neural network

The method takes a DCT coefficient histogram of a JPEG image as input, establishes a frequency domain-based one-dimensional convolutional neural network, and the structure of the network is shown in figure 1. The convolutional neural network of the method is mainly divided into three parts which are respectively input to a convolutional module and a full connection layer, and the specific detection process is as follows:

(1) extracting histograms in a numerical range of [ -5, 5] from each DCT AC coefficient sub-band, wherein the sub-features generated by each sub-band have 11 dimensions, and obtaining the input of the whole network, namely a one-dimensional feature vector with the dimension of 99 × 1 by taking the sub-features of the first 9 AC coefficient sub-bands in a zig-zag sequence;

(2) the network is provided with two convolution modules, each convolution module consists of a one-dimensional convolution layer and a one-dimensional maximum pooling layer and is used for generating 100 characteristic maps, wherein the size of a convolution kernel is 3 × 1, the convolution step size is 1, the size of a pooling kernel is 3 × 1, the convolution step size is 2, in addition, the network uses an Re L U activation function in the connection of each layer and does not use any regularization operation;

(3) full connection layer: the network has three fully connected layers, with the number of neurons being 1000, 1000 and 2, respectively. At the second fully-connected layer, a Softmax function (normalized index function, see equation (1.2) below) is used to calculate the probability for each class. Assuming that the total number of classes of outputs is C and the output of the mth neuron is am, the probability of each class is:

4. double JPEG image compression detection method based on convolution neural network of combined domain

The method adopts a one-dimensional convolutional neural network for a frequency-domain neural network (frequency-domain CNN), and is different from the prior one-dimensional convolutional neural network in that firstly, the numerical range of an input histogram is expanded from [ -5, 5] to [ -50, 50], in other words, the input dimension of the frequency-domain network becomes 909 × 1, secondly, an Re L U activation function is added to a full connection layer and a random discard (Drop Out) mode is used, and a two-dimensional convolutional neural network is adopted for a spatial-domain CNN (spatial-domain CNN), as shown in FIG. 2, the spatial-domain CNN is also mainly divided into three parts, namely an input layer, a convolution module and a full connection layer:

(1) firstly, decompressing a JPEG image to a space domain, and then taking a pixel value matrix (normalized to a range of 0-1) of a full image as input, so that the dimension of input data is 64 × 64 ×;

(2) and 4 convolution modules, wherein each convolution module consists of a two-dimensional convolution layer and an Re L U activation function, the size of a convolution kernel is 3 × 3, the number of output characteristic graphs is 32, 64 and 64 respectively, and in the second and the fourth convolution modules, a maximum value pooling layer with the size of a pooling kernel of 2 × 2 and an Re L U activation function are also used.

(3) Fully-connected layer the network has two fully-connected layers, 256 and 9 for the number of neurons respectively, the first fully-connected layer uses the Re L U activation function and random Drop (Drop Out) approach to make the learning feature more robust, and the second fully-connected layer uses the Softmax function for classification as well.

Finally, in the frequency domain and space domain networks, the output of the first fully-connected layer is taken as a merging feature to form a convolutional neural network (multi-domain CNN) based on the combining domain, as shown in fig. 3.

Detection tests show that in the double JPEG image compression detection method based on the convolution neural network of the combined domain, the performance of the three neural networks is ordered as follows: spatial domain network (SD-CNN) < frequency domain network (FD-CNN) < combined domain network (MD-CNN).

The four detection methods are all based on the frequency domain DCT coefficient histogram feature to carry out the detection of double JPEG compression, and the detection is successful. However, when an attack is performed on the frequency domain DCT coefficient histogram feature by introducing an anti-forensics technique, the accuracy of detection by the above four detection methods is caused.

Based on the above content, the image detection method provided by the application uses the neural network as a detection carrier, and monitors the obtained image to be detected to know whether the image to be detected is a doubly compressed image.

Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of an image detection method provided in the present application, where the method includes:

step S401: and acquiring an image to be detected.

In step S401, the image to be detected may be an image that needs to be identified as being double-compressed, and the image may be a static JPEG image stored separately, a JPEG image in JPEG format stored in a dynamic image, or a JPEG image obtained by capturing data information such as a dynamic video or a dynamic image.

JPEG is the abbreviation of Joint Photographic Experts Group (Joint Photographic Experts Group), and the post-file dropping name is ". jpg" or ". JPEG", which is the most common image file format, is established by a software development and association organization, is a lossy compression format, and can compress images in a small storage space.

In the present embodiment, based on the consideration of the detection target, that is: whether the image to be detected is a double-compression JPEG image or a single-compression JPEG image is detected, therefore, the image to be detected acquired in step S101 may be an image subjected to compression processing, and of course, the acquired image to be detected is intended to detect whether the image to be detected is a double-compression image, and actually, the range of the acquired image to be detected is not limited to the double-compression JPEG image or the single-compression JPEG image.

Step S402: and acquiring at least two types of characteristic images based on the processing of the image to be detected.

In this embodiment, the specific implementation process of obtaining at least two types of feature images based on the processing of the image to be detected may be to segment the image to be detected to obtain at least two types of feature images. In other words, when at least two types of feature images are obtained, the processing on the image to be detected may include segmentation processing on the image to be detected, and in other embodiments, the processing may further include: the image content may include image elements such as image color and image contour according to the extraction of the image content in the image to be detected.

In this embodiment, the division manner is described as the manner of image processing, so the specific implementation process of step S402 may include the following steps:

step S402-a: and dividing the acquired pixel matrix of the image to be detected to obtain image blocks.

Referring to fig. 5, fig. 5 is a schematic structural diagram illustrating an embodiment of segmenting an image to be detected in an image detection method provided by the present application.

In general, the JPEG image compression and decompression processes are all the 8 × 8 blocking processing, so that the 8 × 8 division can be exemplified in the step S402-a by dividing the image to be detected in the form of 8 × 8, but not excluding other forms of division.

If the pixel matrix of the image to be detected is 32 × 32, the blocks divided in the form of 8 × 8 are divided by black thick solid lines as in fig. 5, that is, the blocks are divided into 16 blocks, and the 16 blocks are image blocks, it should be noted that the image to be detected with the pixel matrix of 32 × 32 is used for description here, for the convenience of clear understanding, the technical solution of the present application is not used to limit the size of the image to be detected, and the pixel matrix of the image to be detected may also be 256 × 256 or others.

Step S402-b: and selecting pixels of the area adjacent to the central position of the image block and pixels of the area adjacent to the segmentation intersection position of the image block.

As shown in fig. 5, for the 8 × 8 block example adopted in the step S402-a, the pixels in the area near the center of the image block are selected as 4 × 4 pixels with the center of the 8 × 8 block as the center point, as shown by the diagonal squares in fig. 5, and the pixels in the area near the segmentation intersection of the image block are selected as 4 × 4 pixels with the segmentation intersection as the center point, as shown by the dotted squares in fig. 5.

Step S402-c: and arranging and combining pixels in a region adjacent to the central position of the image Block according to the Block dividing sequence of the image to be detected to obtain an Intra-Block (Intra-Block) characteristic image.

The step S402-c is to arrange and combine the diagonal squares as shown in fig. 5 in the dividing order of the blocks to form a new two-dimensional pixel matrix, and determine the new two-dimensional pixel matrix as the intra-block feature image, and it can be known from the example of the blocks in the step S402-a that the image size of the two-dimensional pixel matrix determined as the intra-block feature image is 16 × 16.

Step S402-d: and partitioning the image into blocks, dividing pixels in the adjacent area of the intersection position, and arranging and combining according to the partitioning sequence of the image to be detected to obtain an Inter-Block characteristic image.

The step S402-c is to arrange and combine the dot-shaped grids shown in fig. 5 according to the dividing sequence of the blocks to form a new two-dimensional pixel matrix, and determine the new two-dimensional pixel matrix as the inter-block feature image, and it can be known from the example of the block division in the step S402-a that the image size of the two-dimensional pixel matrix determined as the inter-block feature image is 12 × 12.

As can be seen from the steps S402-c and S402-d, when the image to be detected is divided, the number of pixels in the area adjacent to the dividing intersection position of the image blocks is less than the number of pixels in the area adjacent to the central position of the image blocks, and therefore, the size of the intra-block feature image obtained after arrangement and combination is larger than the size of the inter-block feature image.

In order to better and intuitively understand that the pixels in the region adjacent to the center position of the image blocks in step S402-c are arranged and combined in the block dividing order of the image to be detected to obtain the intra-block feature image, and the pixels in the region adjacent to the intersection position of the image blocks in step S402-d are arranged and combined in the block dividing order of the image to be detected to obtain the inter-block feature image, please refer to fig. 6, which shows a schematic view of the visual effect after the image to be detected is divided in the image detection method provided by the present application, in which, the middle is an original image of the image to be detected, the size is 256 × 256, the left is the intra-block feature image 128 × 128, and the right is the inter-block feature image 124 × 124.

Step S402-e: and determining the intra-block characteristic image and the inter-block characteristic image as at least two types of acquired characteristic images.

Based on the above-mentioned division result of the image to be detected, the intra-block feature image and the inter-block feature image obtained after the division need to be input to the convolutional neural network for identification, and therefore, the process proceeds to step S403.

Step S403: inputting the information of the at least two types of characteristic images into a neural network with at least two channels for identification to obtain an identification classification result;

first, it should be noted that a neural network (neural network) is an operational model, and is formed by connecting a large number of nodes (or neurons) to each other. Each node represents a particular output function, called an activation function.

The neural network includes: bp (back propagation) neural network, Radial basis function (RBF-basis function) neural network, perceptron neural network, linear neural network, self-organizing neural network, feedback neural network, convolutional neural network (CNN-convolutional neural network), and the like. In the present embodiment, the description of the image detection method is mainly given by taking a convolutional neural network as an example.

The specific implementation process of step S403 may include five steps, which are described below in sequence:

step S403-a: inputting the intra-block feature image into a convolution group with an intra-block path in the convolutional neural network, determining an image feature of the intra-block feature image;

step S403-b: inputting the inter-block feature image into a convolution group with an inter-block path in the convolutional neural network, and determining an image feature of the inter-block feature image.

The above-mentioned steps S403-a and S403-b are respectively explained for the convolution of intra-block paths and the convolution of inter-block paths. In the present embodiment, as shown in fig. 7, fig. 7 is a schematic structural diagram of a convolutional neural network for detecting an image in an image detection method provided by the present application; a convolutional neural network for detecting images includes intra-block paths and inter-block paths, both of which include: a convolution layer, a dimensionality reduction layer, a merging layer and a full connection layer. Based on the implementation procedure of the above steps, first, the structure of the convolutional layer in the convolutional neural network for detecting an image will be described.

The convolutional neural network mentioned in the present embodiment is used for detecting an image based on a spatial domain, and has at least two paths. The spatial domain refers to a space composed of image pixels, that is: the data information of at least two types of feature images obtained in step S401 and step S402 are both pixels of the feature image. The two passes are convolution operations (or detection operations) corresponding to intra-block feature images and inter-block feature images, respectively, and thus the two passes may be referred to as intra-block passes, which are convolved corresponding to intra-block feature images, and inter-block passes, which are convolved corresponding to inter-block feature images.

Based on the example provided in step S402-c (fig. 6), the intra-block feature map size input to the intra-block path is 128 × 128 × 1, and the inter-block feature map size input to the inter-block path is 124 × 124 × 1, where the convolutional layer may include a plurality of convolutional groups, in this embodiment, each of the intra-block path and the inter-block path includes four convolutional groups, each convolutional group includes at least two convolutional blocks with a size of 3 × 3, and the number of output feature images is 32, because 32 convolutional kernels are exemplified in this embodiment, each convolutional kernel outputs one feature image, and the number of feature images that are output in total is 32.

It should be noted that the number of convolution groups is not limited to the number provided in the present embodiment, and the number may be determined according to the size of the actual image to be detected. Similarly, the number of convolution blocks in each convolution group can also be determined by the actual size of the image to be detected.

Referring to fig. 8 in conjunction with fig. 7, fig. 8 is a schematic structural diagram of convolution blocks in a convolutional neural network for detecting an image in an image detection method provided in the present application, where each convolution block includes a sub-convolution layer, a batch normalization layer, and an activation function layer. In order to reduce the computational complexity and find the best balance between the memory efficiency and the memory capacity, the convolutional neural network divides the data of the intra-block characteristic image input to the intra-block path and the data of the inter-block characteristic image input to the inter-block path into small batches (mini-batch) or so-called mini-batches, determines the number of detected intra-block characteristic images and the data of the inter-block characteristic images, and divides the data of the input characteristic images into several batches for detection when the data of the input intra-block characteristic images and the data of the input inter-block characteristic images cannot be detected through the neural network at one time (namely, the data of the input characteristic images is large).

It is understood that when the data of the input feature image is small, the whole batch may be used.

The data of the characteristic images after the small batch processing is convoluted in the rolling block, and then the characteristic images obtained after the convolution processing are subjected to batch normalization processing, wherein the batch normalization processing aims to keep the data of the characteristic images in the blocks after the small batch processing of the characteristic images in the blocks and the data of the characteristic images in the blocks which are input originally in the same distribution, and the data of the characteristic images between the blocks after the small batch processing of the characteristic images between the blocks and the data of the characteristic images between the blocks which are input originally in the same distribution, and the batch normalization processing process is that a proportion parameter gamma and an offset parameter β in a learning formula set (5.1):

in the above formula, x_iAnd

is the original value and the value after batch normalization processing of small batch input data, mu and sigma²Mean and variance, y, of a small batch of data, respectively_iIn the test, assuming the input data is divided into R small batches, the mean and variance of the R-th small batch data are updated in the following formula set:

where ρ is a momentum to keep the updates of adjacent batches balanced.

In this embodiment, the sub-convolutional layer operation of the convolutional block in each convolutional group uses a "zero-padding convolution" approach, i.e., the output feature image and the input data keep the same size. The batch normalization operation uses equations (5.1) and (5.2) for the update operation, namely: the original activation x corresponding to a certain neuron is converted by subtracting the mean value obtained by m activation x obtained by m instances in the mini-Batch and dividing the mean value by the obtained square difference, so that the characteristic image can be ensured to keep the same pixel distribution characteristic point of the original image to be detected.

And obtaining effective characteristics of the characteristic images by activating the function layer for the characteristic images subjected to batch normalization processing in the rolling blocks or directly for the characteristic images after convolution. In some embodiments, when given input data I and a convolution kernel K, a Feature Map (Feature Map) F is generated using a two-dimensional convolution operation, as shown in equation (5.3):

wherein the content of the first and second substances,

represented is a convolution operation in information science, W and H represent the width and height of the size of a convolution kernel K (input image feature data), respectively, m and n represent values of F (m, n) currently calculated, u and v represent serial numbers in the summation process, b_cRepresenting the bias in the convolution operation.

Referring to fig. 9, fig. 9 is a schematic diagram illustrating a convolution operation performed on image feature data input into a convolution block.

Here, the convolution kernel K is 3 × 3, that is:

then W ═ 3, H ═ 3;

inputting image characteristic data:

the process comprises the following steps:

the method comprises the steps of turning a convolution kernel K by 180 degrees to obtain K ', aligning the center of K ' after turning to I (2, 2), multiplying corresponding elements and adding, wherein W is 3, H is 3, u is 2, v is 2, m, n ranges from 0 to 2, F is 1 × 1+0 × 1+1 × 1+0 × 0+1 × 1+0 × 1+1 × 0+0 × 0+1 × 1, F (2, 2) is obtained by repeating the calculation of F (2, 2) on the elements of I in sequence by using the convolution kernel K ', obtaining F (2, 3), repeating the translation slip convolution kernel and convolution, and obtaining a final F value.

In the neural network, commonly used activation functions comprise Sigmoid, TanH, Re L U, leaky Re L U, E L U and the like, and according to the characteristics of different input data, TanH and Re L U are respectively adopted as the activation functions to better study the image characteristics of deep layers in the intricate and complex data from sparse data x^spTo learn valid or key image features, the TanH and Re L U activation functions may use the following equations (5.4) and (5.5), respectively:

in the above formula (5.5), when x is^sp>0, the parameter describing the image feature participates more in the gradient update when x^spWhen the output of the neuron is less than or equal to 0 (other neurons), the output of part of the neurons is zero or less than zero, so that the sparsity of the network can be obtained, the interdependence relation among the parameters of the image characteristics is reduced, and overfitting is avoided.

As shown in fig. 8, in this embodiment, the intra-block path and the inter-block path may respectively include four convolution groups, and since the convolution groups do not have dimension reduction capability, the convolution groups are connected by a pooling layer, so as to reduce the size (or pixel value) of the output feature image.

Structure of intra-block via:

the method comprises the steps of carrying out convolution processing on a first convolution group in intra-block channels, inputting an obtained feature image with the size of 124 and 32 blocks into a first convolution layer, carrying out convolution processing on an intra-block feature image of 124 0124 and 124 and 32 blocks, inputting an obtained feature image with the size of 262 and 332 into a second convolution group, carrying out convolution processing on an input feature image with the size of 62 and 462 and 58 and 658 blocks and inputting an obtained feature image with the size of 58 and 658 blocks into a second convolution layer, carrying out pooling processing on an input feature image with the size of 58 and 858 and 58 and 932 and then inputting an obtained feature image with the size of 29 and 032 into a third convolution group, carrying out convolution processing on an input feature image with the size of 29 and 129 and 29 blocks and then inputting an obtained feature image with the size of 25 and 325 and 432 into a third pooling layer, carrying out pooling processing on an input feature image with the size of 25 and 632 and 25 and fourth convolution group and inputting a feature image with the size of 713 and 713 into a fourth convolution group, carrying out pooling processing on an input feature image with the size of 25 and 32 blocks, and 6, and 32 blocks and 32 and 6.

Structure of inter-block path:

the method comprises the steps of performing convolution processing on a first convolution group in an inter-block pass, inputting an obtained feature image with the size of 122 32 blocks into a first convolution layer, performing convolution processing on an inter-block feature image of 122 0122 132 by the first convolution layer, inputting an obtained feature image with the size of 60 260 332 blocks into a second convolution group, performing convolution processing on an input 60 460 532 inter-block feature image by the second convolution group, inputting an obtained 56 656 732 inter-block feature image into a second convolution layer, performing convolution processing on an input 56 856 inter-block feature image by the second convolution layer, inputting an obtained 28 032 inter-block feature image into a third convolution group, performing convolution processing on an input 28 128 inter-block feature image by the third convolution group, inputting an obtained 24 324 inter-block feature image 432 into a third convolution layer, performing convolution processing on an input 24 inter-block feature image by the third convolution layer, inputting an obtained 12-block feature image into a 712 by the fourth convolution group, performing convolution processing on an input 24-block feature image after 12-block feature image is inputted by the fourth convolution group, and outputting a convolution image after the convolution processing is performed on an input 32, and outputting a feature image after the convolution processing is performed on an input 32, and the convolution processing is performed on an input 24-block feature image, and the convolution image after the convolution processing is performed on an input 32, and the convolution processing is performed on an end feature image after the convolution processing is performed on a convolution processing operation 7-block feature image, and the convolution processing is performed on an input 32, and the convolution processing between a convolution processing between block feature image after the convolution processing.

As can be understood from the above description of the convolutional layer having a convolutional neural network structure with at least two paths, the intra-block path and the inter-block path are respectively convolved by corresponding convolution groups, and the image features of the intra-block feature image and the image features of the inter-block feature image are output. Then, after obtaining the image features of the intra-block feature image and the image features of the inter-block feature image, it is necessary to perform a dimensionality reduction operation on the image features of the intra-block feature image and the image features of the inter-block feature image, respectively, so as to extract the main image features of the intra-block feature image and the main features of the inter-block feature image and avoid over-fitting, and therefore, step S403-c is entered.

Step S403-c: and inputting the image features of the intra-block feature images and the image features of the inter-block feature images into a dimension reduction layer at the tail end of an intra-block access and a dimension reduction layer at the tail end of an inter-block access respectively, and determining the main image features of the intra-block feature images and the main image features of the inter-block feature images.

The specific implementation of this step is also described with reference to the dimension reduction layer structure in the convolutional neural network in fig. 7.

The structure of the dimensionality reduction layer at the end of a via within a block is explained:

the dimensionality reduction layer of the intra-block via end comprises: the dimension reduction pooling layer and the dimension reduction convolution layer are arranged at the tail end of the convolution layer, namely the tail part of the fourth convolution group; the specific dimension reduction process comprises the following steps:

pooling image features of intra-block feature images output by the convolution layers in the intra-block passage to obtain pooled intra-block feature images;

in this embodiment, the maximum pooling is selected for the operation of the pooling layer, because the spatial pixel values are normalized to [0, 1], and the difference between the pixel values at adjacent positions is not large, so the maximum pooling is more reasonable. The pooling process may include:

respectively determining the sizes of the pooling windows of the corresponding in-block pooling layers;

determining the pooling window characteristic value of a pooling layer in a block according to the maximum value of the image characteristic value covered by the size of the pooling window;

and combining all the pooling window characterization values of the pooling layer in the block to obtain the image characteristics of the feature image in the pooled block.

In this embodiment, the pooling window size of the pooling layer is 3 × 3, and the step size is 2.

After the image features of the pooled intra-block feature images are obtained, convolution processing is further performed, that is, the image features of the pooled intra-block feature images are input into an end convolution sublayer in the dimensionality reduction layer to be subjected to convolution processing, and the end convolution sublayer performs convolution on the image features of the pooled intra-block feature images by adopting a convolution kernel of 2 × 2 to obtain main image features of the intra-block feature images.

Following the example in step S403 described above, the intra-block feature image size after the maximum pooling layer processing is 5 × 5 × 32, and the main image feature size of the feature image in the convolved intra block is 4 × 4 × 32.

The structure of the dimensionality reduction layer at the end of the inter-block path is explained:

the dimensionality reduction layer at the end of the inter-block path comprises: the dimension reduction convolutional layer and the dimension reduction pooling layer are arranged at the tail end of the convolutional layer, namely the tail part of the fourth convolutional group; the dimension reduction layer at the end of the inter-block path is different from the dimension reduction layer at the end of the intra-block path in that: the order of the dimensionality reduction convolution layer and the dimensionality reduction pooling layer is different; the dimension reduction convolutional layer in the dimension reduction layer at the end of the inter-block path is positioned before the dimension reduction pooling layer, and the dimension reduction layer at the end of the intra-block path is opposite to the dimension reduction pooling layer, and specifically, the dimension reduction process of the dimension reduction layer at the end of the inter-block path comprises the following steps:

The size of the convolution kernel adopted in the convolution processing is the same as that of the convolution kernel of the dimensionality reduction layer at the tail end of the path in the block.

Following the example in step S403, the main image feature size of the convolved inter-block feature image size of 7 × 7 × 32 and the main image feature size of the maximum pooling layer processed inter-block feature image size of 4 × 4 × 32, it should be noted that both the convolution sublayer in the dimensionality reduction layer at the end of the intra-block path and the convolution sublayer in the dimensionality reduction layer at the end of the inter-block path are effective convolution methods, i.e., zero padding convolution is not used (zero padding operation at the edge of the feature map is not performed), and the dimensionality reduction layer at the end of the intra-block path and the dimensionality reduction layer at the end of the inter-block path can make the main image features of the intra-block feature image and the main image features of the inter-block feature image output by the intra-block path and the inter-block path maintain the same number and the same size of feature images.

In this embodiment, effective convolution is used for both the dimensionality reduction convolution layers in the inter-block path and the intra-block path, that is: zero padding operations are not employed (zero padding operations are not performed at feature image edges).

Based on the above, the main image features in the two channels are obtained, and then the main image features in the two channels need to be merged to obtain a vector of one-dimensional feature images, so that the method proceeds to step S403-d, which is embodied in the merged layer in the convolutional neural network structure of fig. 7.

Step S403-d: combining the main image features of the intra-block feature images and the main image features of the inter-block feature images to obtain the feature vectors of the images to be detected;

in the specific implementation of step S403-d, the main image features with the same number of pixels (32) can be obtained through the above step S403-c, and the main image features obtained in the intra-block path and the main image features obtained in the inter-block path have the same size (4 × 4).

Firstly, respectively converting the main image features of the intra-block feature images and the main image features of the inter-block feature images into one-dimensional feature vectors to obtain intra-block one-dimensional feature vectors and inter-block one-dimensional feature vectors;

and then, combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structure sequence of the corresponding intra-block passage and inter-block passage respectively to obtain the characteristic vector of the image to be detected.

In the above example, the two-dimensional feature vectors of the main image features in the two paths are 4 × ×, the two-dimensional feature vectors are converted into one-dimensional feature vectors 512 ×, and then the one-dimensional feature vectors are sequentially combined according to the order of the path structure, so as to obtain a feature vector with a dimension of 1024 × 1, the path structure may be that the one-dimensional feature vector of the main image feature in the intra-block path is obtained first, then the one-dimensional feature vector of the main image feature in the inter-block path is obtained, and then the two are combined, so that the front 512 × 1 of the combined 1204 × 1 feature vector is the one-dimensional feature vector of the main image feature of the intra-block path, and the rear 512 × is the one-dimensional feature vector of the main image feature of the inter-block path, or the one-dimensional feature vector of the main image feature in the inter-block path is obtained first, and then the one-dimensional feature vector of the main image feature in the intra-block path is obtained, and correspondingly, the front 512 × of the combined 1204 × feature vector is the one-dimensional feature vector of the main image feature of the inter-block path.

Step S403-e: and transmitting the characteristic vectors to a full connection layer of the convolutional neural network, and determining the identification and classification result of the image to be detected.

The specific implementation process of steps S403-e can be described with reference to fig. 7 regarding the structure of the fully-connected layer in the convolutional neural network.

The full-link layer of the step S403-e is to perform recognition and classification according to the one-dimensional feature vector of the input image feature, that is, to transmit the merged one-dimensional feature vector 1024 × 1 to the full-link layer as the output of the full-link layer neuron of the first layer, and perform dense connection with 30 neurons of the second layer of the full-link layer to obtain the output of the neuron of the second layer of the full-link layer.

The output of the second layer of neurons may be obtained by

Representing the output of the s-th neuron in the l-th network and the output of the t-th neuron in the l + 1-th network

Is formulated as:

wherein the content of the first and second substances,

and

in addition, f (-) represents the activation function.

The neuron is understood to be the number of matrix pixels (which may also be referred to as matrix elements) in the input image feature; the number of neurons in each layer of the fully-connected layer can be determined according to matrix pixels in the trained compressed sample training data.

In order to counteract the problem of the dual recognition accuracy rate decreasing caused by introducing the anti-forensics method and enhance the display of the pixels in the empty space, the image detection method provided by the present application further includes:

Since the filtering process can be implemented by a differential filter, as shown in fig. 7, the intra-block path and the inter-block path of the convolutional neural network provided in this embodiment further include a differential filter layer, which may also be referred to as an intra-block differential filter layer and an inter-block differential filter layer, respectively. The intra-block differential filter layer is located at an initial end of an intra-block path, and the inter-block differential filter layer is located at an initial end of an inter-block path.

The convolution operation in the "zero padding convolution" mode, i.e. 2 convolution kernels with the size of 2 × 2 are respectively used to convolute each feature map, and the parameters of the 2 convolution kernels are shown in the formula (5.7):

the differential filter layer can enhance the signal-to-noise ratio of the segmented intra-block characteristic images and inter-block characteristic images, further enhance information left in a space domain after JPEG compression and prevent traces left by evidence obtaining, and is beneficial to improving the detection accuracy of the double-compression image.

Step S404: and determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result.

The specific implementation process of step S404 is to calculate, according to the connection relationship between the one-dimensional feature vector after combination and the neurons in each network layer in the full connection layer through the Softmax classifier in step S403-e, a classification result obtained, where the classification result may be a two-dimensional vector (x, y), and x + y is 1; in this embodiment, x represents a probability that the image to be detected input by the network is a double-compressed image, and y represents a probability that the image to be detected input by the network is a single-compressed image, or it can be understood that a difference between the combined one-dimensional feature vector and a one-dimensional vector feature of the network layer of the full connection layer is used to determine whether the image to be detected belongs to double compression, where the difference may be a two-dimensional vector, and therefore, the determining process in step S404 may include:

comparing the single-compression classification probability value of the identification classification result of the image to be detected with the double-compression classification probability value, and if the double-compression classification probability value (double-compression classification difference) is greater than the single-compression classification probability value (single-compression classification difference), determining the obtained image to be detected as double-compression image information; otherwise, the image is a single compressed image.

Based on the above, before the processing based on the obtained image to be detected, the method may further include:

According to the fact that an input image to be detected IS a gray image IS, the pixel value range IS [0, 255], normalization preprocessing can be carried out on the image to be detected by adopting the following formula:

IS＝IS/255

the above is a description of an embodiment of an image detection method provided in the present application. In contrast to the foregoing embodiments of the image detection method, the present application further discloses an embodiment of an image detection apparatus, and please refer to fig. 10, since the apparatus embodiment is basically similar to the method embodiment, the description is relatively simple, and related points can be found in the partial description of the method embodiment. The device embodiments described below are merely illustrative.

As shown in fig. 10, fig. 10 is a schematic structural diagram of an embodiment of an image detection apparatus provided by the present application, and the apparatus includes:

an acquiring unit 901, configured to acquire an image to be detected.

A processing unit 902, configured to obtain at least two types of feature images based on the processing on the image to be detected.

The processing unit 902 is specifically configured to segment the image to be detected to obtain at least two types of feature images, and may include:

the dividing subunit is used for dividing the acquired pixel matrix of the image to be detected to obtain image blocks;

the selecting subunit is used for selecting pixels of an area adjacent to the central position of the image block in the dividing subunit and selecting pixels of an area adjacent to the dividing intersection position of the image block;

the intra-block obtaining subunit is used for arranging and combining the pixels of the area adjacent to the central position of the image block according to the block dividing sequence of the image to be detected to obtain an intra-block characteristic image;

and the inter-block obtaining subunit is used for partitioning the image into blocks and dividing pixels of the adjacent areas at the intersection positions, and arranging and combining the pixels according to the partitioning sequence of the image to be detected to obtain the inter-block characteristic image.

And the type determining subunit is used for determining the intra-block characteristic image and the inter-block characteristic image as the obtained at least two types of characteristic images.

A classification recognition unit 903, configured to input information of the at least two types of feature images into a neural network with at least two paths for recognition, so as to obtain a recognition classification result;

the classification identifying unit 903 may include:

an intra-block image feature determination subunit configured to input the intra-block feature image into a convolution group with an intra-block path in the convolution neural network, and determine an image feature of the intra-block feature image;

an inter-block image feature determination subunit, configured to input the inter-block feature image into a convolution group with an inter-block path in the convolution neural network, and determine an image feature of the inter-block feature image;

a main image feature determination subunit, configured to input the image features of the intra-block feature image and the image features of the inter-block feature image to a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, and determine main image features of the intra-block feature image and main image features of the inter-block feature image;

a feature vector obtaining subunit, configured to combine the main image features of the intra-block feature image and the main image features of the inter-block feature image to obtain a feature vector of the image to be detected;

and the classification result determining subunit is used for transmitting the feature vectors to a full connection layer of the convolutional neural network and determining the identification and classification result of the image to be detected.

The above is only a general description, and the specific process refers to the description of the image detection method, which is not repeated herein.

It is understood that, in order to resist the problem of the dual recognition accuracy rate being reduced by introducing the anti-forensic mode and to enhance the display of pixels in the space, the image device method provided by the present application further comprises:

and the filtering unit is used for respectively carrying out filtering processing on the intra-block characteristic image and the inter-block characteristic image before determining the image characteristics of the intra-block characteristic image and the image characteristics of the inter-block characteristic image to obtain the filtered intra-block characteristic image and inter-block characteristic image.

A determining unit 904, configured to determine whether the image to be detected belongs to the dual compressed image information according to the recognition and classification result.

The determining unit 904 may include:

and the comparison determining subunit is used for comparing the single-compression classification probability value of the identification classification result of the image to be detected with the double-compression classification probability value, and if the double-compression classification probability value is greater than the single-compression classification probability value, determining the acquired image to be detected as double-compression image information.

Further, the image detection apparatus provided by the present application may further include:

and the preprocessing unit is used for carrying out normalization preprocessing on the acquired image to be detected to obtain a preprocessed image to be detected.

The obtaining unit 901 is specifically configured to perform segmentation according to the obtained preprocessed image to be detected, so as to obtain at least two types of feature images.

In view of the above, the present application also provides a method for training a neural network model, please refer to fig. 11, where fig. 11 shows a flowchart of an embodiment of the method for training a neural model provided in the present application, and the method for training the neural network model includes:

step S1001: training sample images are acquired.

The acquiring of the training sample image in step S1001 may be acquiring an image determined to be double-compressed and/or a single-compressed image, where the image may be a static JPEG image stored separately, a JPEG image stored in JPEG format by using a moving image, or a JPEG image obtained by cutting out data information such as a moving video or a moving image.

Step S1002: and obtaining at least two types of training sample characteristic images based on the processing of the training sample images.

The specific implementation process of step S1002 may be to segment the training sample image to obtain at least two types of feature images. For a specific segmentation process, reference may be made to the description of step S402 in the image detection method, which is not described herein again.

Step S1003: and inputting the information of the at least two types of training sample characteristic images into a neural network with at least two paths for recognition to obtain recognition and classification results.

Similarly, for the specific implementation process of step S1003, reference may be made to the description of step S403 in the above image detection method, and details are not repeated here.

Step S1004: determining the weight of the convolutional neural network according to the recognition and classification result;

the specific implementation process of step S1004 may include:

step S1004-a: calculating according to the classification result as a classification label value and a real label value of the training sample image to obtain a loss value;

step S1004-b: determining a loss value as a weight of the convolutional neural network.

Wherein the step S1004-a may be calculated by the following formula:

wherein, P_nIs to calculate an actual output probability result, q_nIs the expected outputThe result (i.e., the true label), w, represents the weight of the network.

Step S1005: and updating the weight of the convolutional neural network according to the determined weight of the convolutional neural network to obtain a trained convolutional neural network model.

The specific implementation process of step S1005 may be to update the network weights in a back propagation manner, for example, given the learning rate α, the network weights in the t-th training iteration are:

in the above-mentioned formula,

finger pair loss value L_(w)The derivation is done with respect to the network weights w.

The above is a description of an embodiment of a training method for a convolutional neural network model provided in the present application. Corresponding to the embodiment of the training method of the convolutional neural network model provided above, the present application also discloses an embodiment of a training apparatus of the convolutional neural network model, please refer to fig. 12, since the apparatus embodiment is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment. The device embodiments described below are merely illustrative.

An acquisition unit 1101 for acquiring a training sample image;

a processing unit 1102, configured to obtain at least two types of training sample feature images based on processing of the training sample images;

the identification unit 1103 is configured to input information of the at least two types of training sample feature images into a neural network with at least two paths for identification, so as to obtain an identification classification result;

a determining unit 1104, configured to determine a weight of the convolutional neural network according to the recognition and classification result;

an updating unit 1105, configured to update the weights of the convolutional neural network according to the determined weights of the convolutional neural network, so as to obtain a trained convolutional neural network model.

According to the image detection method and the training method, a JPEG image to be detected is divided into 8 × 8 blocks through a dividing layer in a convolutional neural network, then a pixel in the center of each Block is selected from the 8 × 8 blocks to serve as an Intra-Block (Intra-Block) feature image of 4 × 4, an Inter-Block (Inter-Block) feature image of 4 × 4 at a cross position (namely every four adjacent JPEG image pixels) in each 8 × 8 Block is selected to serve as input of two convolutional structures in the convolutional neural network, and image features of a deep level are detected.

In addition, the signal-to-noise ratio of the image is enhanced through the intra-block difference filter and the inter-block difference filter, and information left in a space domain by JPEG compression and traces left by anti-evidence obtaining operation are enhanced, so that double JPEG compression detection and anti-evidence obtaining operation detection are facilitated.

Based on the above, the present application further provides a computer storage medium for storing a program for generating data by a network platform and processing the data generated by the network platform;

acquiring an image to be detected;

The present application further provides an electronic device, comprising:

a processor;

acquiring an image to be detected;

In order to verify the effectiveness of the two-channel convolutional neural network (SB-CNN) for detecting double JPEG and anti-forensics operations, the two-channel convolutional neural network (SB-CNN) is compared with a related detection scheme, so that the technical effects of the two-channel convolutional neural network are more deeply understood.

The experiment mainly comprises two aspects of detection of double JPEG compression and double JPEG compression detection of an embedding anti-evidence obtaining technology; the method comprises the steps that a deep learning open source tool TensorFlow is used as an experiment platform, and a GPU computing card with the model number of NVidi Tesla-P100 is used for completing an experiment; a widely used image library BOSSBase v1.01 containing 10000 gray images with uncompressed PGM as storage format was used as the source of experimental images.

1. Dual JPEG compression detection

This section of experiment will still select 4 representative compression quality factor combinations as experimental objects to measure the performance of the network, i.e. (QF1, QF2) ═ (70, 75), (80, 75), (85, 70), (85, 90). The selection principle is as follows: the compression quality is moderately high, and the two cases of QF1> QF2 and QF1< QF2 exist, and the two cases of the same QF1 or QF2 and different QF2 and QF1 combinations exist.

Table 1 shows the performance of the two-pass convolutional neural network (SB-CNN) and other comparative methods provided herein with respect to dual JPEG compression detection in the above-described compression quality factor combinations. The DFSD is a method based on initial character, the DAH is a method based on statistical histogram, and the DFD-CNN and the DMD-CNN are neural network methods based on frequency domain and double domain respectively. The highest detected accuracy value in each compression quality factor combination in table 1 is shown in bold.

Table 1: double JPEG compression detection accuracy (%) comparison table

(QF1，QF2)	(70，75)	(80，75)	(85，70)	(85，90)
					DFSD	99.22	99.36	97.70	99.98
DAH	98.73	99.02	98.99	99.91
					DFD-CNN	98.74	99.00	97.36	99.96
DMD-CNN	95.78	99.04	98.54	99.99
					DSB-CNN	98.70	97.58	99.49	99.82

From the above table, the following summary can be made:

the performance of the DSB-CNN is on a medium level, and most of the detection accuracy rates in Table 1 are more than 97% and even close to 100%, so that the method has strong and similar capabilities, and the detection accuracy rate has a limited space for improvement.

2. Dual JPEG compression detection with embedded anti-forensics

The detection performance of the convolutional neural network (SB-CNN) with two paths provided by the application on the double JPEG compressed image embedded with different types of anti-evidence obtaining technologies is tested respectively and compared with the traditional artificial feature method. The selected comparison methods include the method based on the initial character DFSD and the method based on the statistical histogram DAH.

(1) Application of anti-forensics techniques to scene classification

The SB-CNN has the capability of detecting the double JPEG compressed image attacked by the anti-forensics technology, and the specific anti-forensics technology scene is firstly divided to cover the conventional image anti-forensics technology.

The anti-forensic technique studies the distribution of DCT coefficients in JPEG compressed images and provides a deterrent to forensics by fitting the distribution to the DCT coefficient distribution of uncompressed images. Then, a comprehensive anti-evidence obtaining method is proposed according to the blocking effect of the JPEG image. Based on the optimal balance between the non-detectability and the anti-evidence obtaining performance, a four-step JPEG compression trace eliminating method is provided. The main characteristics of the above methods are that the DCT coefficients are first re-fitted, and then the empty domain is repaired to erase the compression traces of the JPEG image. As shown in fig. 13, fig. 13 is a schematic diagram of a common scenario of the embedding anti-forensic technology in the dual JPEG compression process.

The images generated according to the scene mapping provided in fig. 13 are represented by JPEG (1) A, JPEG (2) B0, respectively, and specifically:

(1) JPEG (1) A: JPEG is used for compressing the image once, and the quantization quality factor is QF;

(2) JPEG (2) B0: natural dual JPEG compressed image with first and second quantization quality factors of QF1 and QF 2;

(3) JPEG (2) B1: the anti-evidence double JPEG compressed image is generated by adding attacks to the airspace when JPEG decompresses and retracts data of the airspace, wherein the attacks comprise contrast enhancement, image size scaling, median filtering and the like;

(4) JPEG (2) B2: and (4) performing anti-evidence operation on the DCT coefficient of the JPEG image and decompressing and retracting the airspace. Such anti-forensic methods are representative;

(5) JPEG (2) B3: and (3) performing anti-evidence operation on the double JPEG compressed image in a DCT domain of the JPEG compressed image and a spatial domain of the JPEG decompressed image respectively.

(2) Airspace anti-forensics image:

first, a double JPEG compressed image in which the anti-forensic operation is performed only in the JPEG decompressed image spatial domain, i.e., the third type image JPGB1 in fig. 13, is detected. Taking the compression quality factor combination (QF1, QF2) ═ 70, 75 as an example, the attack image is generated by the following general method:

① Contrast Enhancement (Contrast Enhancement), the brightness parameters are (0.5, 0.6., 1.4, 1.0 respectively excluded);

② Median Filtering (media Filtering), with a filter window size s of 3 × 3, 2 × 2, and a variance v of white gaussian noise of 2, 3;

③ image size scaling (Resize) and returning to original size with scaling ratio of (0.7, 0.8.., 1.3, 1.0 excluded);

④ the image size is scaled, the scaling ratio is not returned to (1.1, 1.2, 1.3, 1.4). all the above images are generated by MAT L AB software, the used commands are "imjust", "medfilt", "imresize" in sequence, and the using time points of the commands are corresponding to the word of "airspace anti-evidence-taking" in the generation process of the third type JPGB1 in FIG. 13.

Referring to fig. 14, fig. 14 is a schematic diagram illustrating the detection effect of fig. 1 for an image processed by an anti-forensic method. For convenience of representation, the image type is represented by english detection plus parameters, and is exemplified as follows:

② "CE — 0.5" indicates contrast enhancement operation using a brightness parameter of 0.5;

② "MFG _ s3v 2" indicates a window size of 3 × 3 for median filtering, noise method of 2;

③ "RSR — 0.8" indicates that the image scale is 0.8 and returns to the original size;

④ "RSC _ 1.2" indicates that the image magnification is 1.2 and does not return to original size;

⑤ "S/D" represents a natural double JPEG compressed image.

In addition, the detection methods are DFSD, DAH and DSB-CNN in the present application, respectively.

As can be observed from fig. 14, the detection accuracy of the first three detection methods to natural dual JPEG compression (S/D) is close to 100%, and after the spatial domain inverse evidence obtaining technology is implemented, the three detection methods all show a substantially similar descending trend, but the descending amplitudes are different greatly. The first letter characteristic method represented by the blue round block curve and the statistical histogram method represented by the black square have larger descending amplitude, the network (red triangle block curve) reflects higher detection accuracy and stability, the network is basically kept in a higher accuracy state, and for example, on the detection of image types 'CE _ 1.4' and 'RSC _ 1.1', the accuracy of the double-channel convolutional neural network is improved by 11% -33%, and the strong capacity of resisting airspace anti-evidence technology is displayed.

(3) Comprehensive anti-forensic images:

the spatial domain anti-forensic image is subjected to anti-forensic operation in a spatial domain before the second JPEG compression, and only the blocking effect characteristic of some spatial domains can be covered, and the removal of a compression trace is not carried out according to the characteristic of the JPEG image. In the comprehensive anti-evidence obtaining technology, on the basis of establishing a mathematical model, the coefficient distribution difference between a compressed image and an uncompressed image is more accurately made up respectively for the DCT domain of the JPEG image or simultaneously for the DCT domain and the spatial domain.

Table 2 shows the performance of the same three methods for detecting the double JPEG compression with the comprehensive anti-forensics technique embedded therein, wherein AStamm, AFan, and ASingh are anti-forensics methods, respectively. As can be seen from table 2, in most cases, the methods based on the conventional artificial features show poor performance in the detection against the anti-forensics technique, because they are based on the features of the DCT coefficients of the JPEG image, and these features are erased to some extent by the anti-forensics method. The dual-channel convolutional neural network SB-CNN can stably keep a high detection rate (the result is all over 90%), the detection accuracy under some conditions is even improved by 45% compared with that of an artificial characteristic method, and the SB-CNN designed based on the spatial domain characteristics of the JPEG image has good detection performance on dual JPEG compression and can well resist attack of an anti-evidence-obtaining technology.

Table 2: double JPEG compression detection accuracy (%) comparison table embedded with anti-evidence obtaining technology

Compared with the prior art, the image detection method and the related dual-channel convolutional neural network have higher detection accuracy and better performance in detection of normal images or images subjected to anti-forensics operation.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and any person skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be limited by the scope of the claims.

Claims

1. An image detection method, comprising:

acquiring an image to be detected;

2. The image detection method according to claim 1, wherein obtaining at least two types of feature images based on the processing of the image to be detected comprises:

3. The image detection method according to claim 2, wherein the segmenting the image to be detected to obtain at least two types of feature images comprises:

4. The image detection method according to claim 3, wherein the dividing the acquired image to be detected to obtain image blocks comprises:

5. The image detection method according to claim 3, wherein inputting the information of the at least two types of feature images into a neural network having at least two paths for recognition to obtain recognition classification results comprises:

6. The image detection method according to claim 5, further comprising:

7. The image detection method according to claim 5, wherein the inputting the intra-block feature image into a convolution group with an intra-block path in the neural network to determine an image feature of the intra-block feature image includes:

inputting the intra-block feature image into a convolution group of an intra-block passage in the neural network for convolution processing to obtain a processed intra-block feature image;

8. The image detection method according to claim 5, wherein the step of inputting the image features of the intra-block feature image and the image features of the inter-block feature image to a dimension reduction layer at an intra-block path end and a dimension reduction layer at an inter-block path end, respectively, and determining the main image features of the intra-block feature image and the main image features of the inter-block feature image comprises:

performing convolution processing on the pooled intra-block characteristic images to obtain main image characteristics of the intra-block characteristic images;

9. The image detection method according to claim 5, wherein merging the main image features of the intra-block feature image and the main image features of the inter-block feature image to obtain the feature vector of the image to be detected comprises:

and combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structure sequence of the corresponding intra-block passage and inter-block passage respectively to obtain the characteristic vector of the image to be detected.

10. The image detection method according to claim 1, wherein said determining whether the acquired image to be detected belongs to a double-compressed image according to the recognition and classification result comprises:

11. The image detection method according to claim 1, further comprising:

12. The image detection method of claim 1, wherein the neural network having at least two paths is a convolutional neural network.

13. An image detection apparatus, characterized by comprising:

the acquisition unit is used for acquiring an image to be detected;

the classification and identification unit is used for inputting the information of the at least two types of characteristic images into a neural network with at least two channels for identification to obtain an identification and classification result;

and the determining unit is used for determining whether the image to be detected belongs to the double-compression image information or not according to the identification and classification result.

14. A training method of a neural network model is characterized by comprising the following steps:

acquiring a training sample image;

15. The method for training a neural network model according to claim 14, wherein the obtaining at least two types of training sample feature image information based on the processing of the training sample images comprises:

16. The method for training a neural network model according to claim 15, wherein the segmenting the training sample image to obtain at least two types of feature images comprises:

the intra-block training sample characteristic images and the inter-block training sample characteristic images are at least two types of obtained training sample characteristic images.

17. The method for training a neural network model according to claim 16, wherein the dividing the acquired training sample images to obtain training sample image blocks comprises:

18. The method for training the neural network model according to claim 17, wherein inputting the information of the at least two types of training sample feature images into a neural network having at least two paths for recognition to obtain recognition classification results comprises:

and transmitting the feature vectors to a full connection layer of the neural network, and determining the recognition classification result of the training sample image.

19. The method of training a neural network model of claim 18, further comprising:

20. The method of claim 18, wherein the inputting the intra-block training sample feature images into a convolution group with intra-block paths in the neural network to determine image features of the intra-block training sample feature images comprises:

21. The method for training a neural network model according to claim 18, wherein the determining the dominant image features of the intra-block training sample feature images and the dominant image features of the inter-block training sample feature images by inputting the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images to a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, comprises:

performing convolution processing on the image characteristics of the inter-block training sample characteristic images in the inter-block passage to obtain the convolved inter-block training sample characteristic images;

22. The method for training a neural network model according to claim 18, wherein the combining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images to obtain the feature vectors of the training sample images comprises:

and combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structural sequence of the corresponding intra-block access and inter-block access respectively to obtain the characteristic vector of the training sample image.

23. The method of training a neural network model of claim 14, further comprising:

24. The method for training a neural network model according to claim 14, wherein the determining weights of the neural network according to the classification result comprises:

determining a loss value as a weight of the neural network.

25. A method for training a neural network model according to claim 14 or 24, wherein the updating weights of the neural network according to the determined weights of the neural network to obtain the trained neural network model comprises:

26. An apparatus for training a neural network model, comprising:

an acquisition unit for acquiring a training sample image;

the recognition unit is used for inputting the information of the characteristic images of the at least two types of training samples into a neural network with at least two channels for recognition to obtain recognition and classification results;

27. A computer storage medium for storing network platform generated data and a program for processing the network platform generated data;

acquiring an image to be detected;

28. An electronic device, comprising:

a processor;

acquiring an image to be detected;