CN111932544A

CN111932544A - Tampered image detection method and device and computer readable storage medium

Info

Publication number: CN111932544A
Application number: CN202011114858.2A
Authority: CN
Inventors: 倪江群; 饶远; 张伟哲
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2020-11-13

Abstract

The invention discloses a method for detecting a tampered image, which comprises the following steps: acquiring an image to be detected, and performing image blocking on the image to be detected based on a first preset blocking rule to obtain a plurality of image blocks to be detected; extracting the features of each image block to be detected through a pre-trained local descriptor to obtain the local features corresponding to each image block to be detected; performing feature fusion on each local feature to obtain a global feature corresponding to the image to be detected; and inputting the global features into a support vector machine to determine whether the image to be detected is a tampered image. The invention also discloses a tampered image detection device and a computer readable storage medium. According to the method, the image to be detected is blocked, and whether the image to be detected is a tampered image is detected through the image block to be detected, so that the accuracy of characteristic extraction is improved, and the detection efficiency and accuracy of the tampered image are improved.

Description

Tampered image detection method and device and computer readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for detecting a tampered image, and a computer-readable storage medium.

Background

With the development of digital image editing software, the digital image processing software has more and more powerful functions and more convenient use, and the technical means are used in the media industry from self-adaptive exposure, deblurring, matting and image splicing and fusion, can bring more interest, and promote the development of entertainment and image social industries to a certain extent. But a batch of speculators are bred, and forged or tampered images are specially generated to do illegal things, such as obtaining illegal benefits in the insurance and financial industries, so that the security of multimedia contents is seriously threatened.

At present, image stitching is the most common image tampering means, and the current image stitching is mainly detected in a deep learning-based manner. However, since the tampering trace has a signal intensity that is too small compared with the image content, the neural network is directly used to extract the features from end to end from the input image, and the effect of detecting the image through the extracted features is not good, which results in low accuracy and efficiency of detecting the tampering image.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method and a device for detecting a tampered image and a computer readable storage medium, and aims to solve the technical problems of low accuracy and low efficiency of the conventional method for detecting the tampered image through deep learning.

In order to achieve the above object, the present invention provides a tamper image detection method, including the steps of:

acquiring an image to be detected, and performing image blocking on the image to be detected based on a first preset blocking rule to obtain a plurality of image blocks to be detected;

extracting the features of each image block to be detected through a pre-trained local descriptor to obtain the local features corresponding to each image block to be detected;

performing feature fusion on each local feature to obtain a global feature corresponding to the image to be detected;

and inputting the global features into a support vector machine to determine whether the image to be detected is a tampered image.

Further, the step of extracting features of each image block to be detected through a pre-trained local descriptor to obtain local features corresponding to each image block to be detected includes:

and respectively inputting each image block to be detected into a pre-trained local descriptor, and taking an output result of the last convolutional layer in the pre-trained local descriptor as a local feature corresponding to each image block to be detected.

Further, the step of performing feature fusion on each local feature to obtain a global feature corresponding to the image to be detected includes:

performing aggregation operation on each local feature based on each image block to be detected to obtain a feature image, wherein the local feature comprises features of a plurality of channels;

performing image segmentation on the feature image based on a second preset segmentation rule to obtain a plurality of feature blocks;

respectively carrying out block pooling on the feature blocks of each channel to obtain pooled feature vectors corresponding to each channel;

and determining a target feature vector based on each pooled feature vector, and taking the target feature vector as the global feature.

Further, before the step of extracting features of each image block to be detected through the pre-trained local descriptor, the tampered image detection method further includes:

acquiring a training data set, wherein the training data set comprises spliced image blocks and real image blocks;

inputting the training data set into two sub-networks in a convolutional neural network to obtain a first loss function, a second loss function and a comparison loss function between the two sub-networks, wherein the first loss function and the second loss function correspond to the two sub-networks;

determining a pre-trained local descriptor based on the first loss function, the second loss function, and the contrast loss function.

Further, the step of determining a pre-trained local descriptor based on the first loss function, the second loss function, and the contrast loss function comprises:

determining a target loss function based on the first loss function, the second loss function, and the contrast loss function;

when the target loss function meets a preset condition, updating the convolutional neural network based on the target loss function, and determining a pre-trained local descriptor based on the updated convolutional neural network;

and when the target loss function does not meet the preset condition, updating the convolutional neural network based on the target loss function, taking the updated convolutional neural network as the convolutional neural network, and returning to execute the step of inputting the training data set into two sub-networks in the convolutional neural network.

Further, the step of inputting the training data set into two sub-networks in a convolutional neural network to obtain a contrast loss function between the two sub-networks comprises:

determining the feature vectors of the training samples corresponding to the training data set, and determining Euclidean distances corresponding to the training samples based on the feature vectors, the spliced image blocks and the real image blocks;

determining the contrast loss function based on the Euclidean distance.

Further, after the step of inputting the global features into a support vector machine to determine whether the image to be detected is a tampered image, the method for detecting a tampered image further includes:

if the image to be detected is a tampered image, acquiring a target sub-network in the pre-trained convolutional neural network based on a pre-trained local descriptor;

inputting each image block to be detected into the target subnetwork respectively to obtain the corresponding tampering probability of each image block to be detected;

determining a prediction label graph corresponding to the image to be detected based on the tampering probability, and performing mean value filtering on the prediction label graph to obtain a continuous splicing probability graph;

and carrying out splicing image positioning on the images to be detected based on the continuous splicing probability map.

Further, the step of performing mosaic image positioning on the image to be detected based on the continuous mosaic probability map comprises:

deducing a continuous splicing probability graph based on a conditional random field model to obtain the corresponding tampering splicing probability of each pixel point in the continuous splicing probability graph;

and determining target tampered pixel points in all the pixel points based on the tampered splicing probability.

Further, in order to achieve the above object, the present invention provides a tamper image detection apparatus including: the image detection device comprises a memory, a processor and a tampered image detection program which is stored on the memory and can run on the processor, wherein the tampered image detection program realizes the steps of the tampered image detection method when being executed by the processor.

In addition, to achieve the above object, the present invention further provides a computer-readable storage medium having a tamper image detection program stored thereon, the tamper image detection program, when executed by a processor, implementing the steps of the tamper image detection method described above.

The method comprises the steps of obtaining an image to be detected, and carrying out image blocking on the image to be detected based on a first preset blocking rule to obtain a plurality of image blocks to be detected; then, extracting the features of each image block to be detected through a pre-trained local descriptor to obtain the local features corresponding to each image block to be detected; then, performing feature fusion on each local feature to obtain a global feature corresponding to the image to be detected; and then inputting the global features into a support vector machine to determine whether the image to be detected is a tampered image, blocking the image to be detected, and detecting whether the image to be detected is the tampered image through the image block to be detected, so that the accuracy of feature extraction is improved, and the detection efficiency and accuracy of the tampered image are improved.

Drawings

Fig. 1 is a schematic structural diagram of a tamper image detection apparatus of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a tamper image detection method according to a first embodiment of the present invention;

FIG. 3 is a schematic diagram of the convolutional neural network of the present invention;

FIG. 4 is a schematic diagram illustrating image stitching positioning results according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a filter bank according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating the accuracy of a convolutional network structure in accordance with an embodiment of the present invention;

FIG. 7 is a schematic diagram of the splice detection performance of the convolutional network structure in an embodiment of the present invention;

fig. 8 is a schematic diagram of the splice location performance of the convolutional network structure in an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a tamper image detection apparatus of a hardware operating environment according to an embodiment of the present invention.

The tampered image detection device in the embodiment of the invention can be a PC, and can also be a mobile terminal device with a display function, such as a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a portable computer, and the like.

As shown in fig. 1, the falsified image detection apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the tamper image detection device may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Of course, the tampered image detecting device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

It will be understood by those skilled in the art that the configuration of the tamper image detection device shown in fig. 1 does not constitute a limitation of the tamper image detection device, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a tamper image detection program.

In the tampered image detection device shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call a tamper image detection program stored in the memory 1005.

In the present embodiment, the falsified image detection apparatus includes: a memory 1005, a processor 1001 and a tamper image detection program stored on the memory 1005 and executable on the processor 1001, wherein the processor 1001, when calling the tamper image detection program stored in the memory 1005, performs the following operations:

Further, the processor 1001 may call a falsified image detection program stored in the memory 1005, and also perform the following operations:

determining the contrast loss function based on the Euclidean distance.

The invention further provides a tampered image detection method, and referring to fig. 2, fig. 2 is a schematic flow diagram of a tampered image detection method according to a first embodiment of the invention.

In this embodiment, the tamper image detection method includes:

step S110, acquiring an image to be detected, and performing image blocking on the image to be detected based on a first preset blocking rule to obtain a plurality of image blocks to be detected;

in this embodiment, an image to be detected is obtained first, and image partitioning is performed on the image to be detected according to a first preset partitioning rule to obtain a plurality of image blocks to be detected, where each image block to be detected is not overlapped and includes all contents of the image to be detected, for example, non-overlapped image partitioning is performed on an input image, a block size is 128 × 128, the block size is consistent with an input size of a convolutional neural network of a pre-trained local descriptor, R × C image blocks are total, and each image block is denoted as b (x, y), and (x, y) is a spatial index.

Step S120, extracting the features of each image block to be detected through a pre-trained local descriptor to obtain the local features corresponding to each image block to be detected;

in this embodiment, after each image block to be detected is determined, feature extraction is performed on each image block to be detected through a pre-trained local descriptor, so as to obtain a local feature corresponding to each image block to be detected.

The method comprises the steps of firstly training a neural network through a training set comprising real image blocks and tampered image blocks, determining a pre-trained local descriptor according to the trained neural network, respectively inputting each image block to be detected into the pre-trained local descriptor, taking an output result of a last convolutional layer in the pre-trained local descriptor as a local feature corresponding to each image block to be detected, and extracting a feature b (x, y) for each image block to obtain a feature f (x, y) with the size of 5 x 16 (a 5 x 5 matrix with 16 channels). Then, all f (x, y) are arranged in the same spatial index as b (x, y), and are aggregated into one (5 · R) × (5 · C) × 16 feature image.

Step S130, performing feature fusion on each local feature to obtain a global feature corresponding to the image to be detected;

in this embodiment, after the local features of each image block to be detected are obtained, feature fusion is performed on each local feature to obtain a global feature, specifically, h × w blocking is performed on the feature image to obtain a plurality of feature blocks, where the size of each feature block is:

wherein the content of the first and second substances,

is a rounded down function.

Pooling the feature blocks corresponding to each channel of the feature image to obtain a feature vector corresponding to each channel, and fusing a plurality of feature vectors to obtain the global feature.

Step S140, inputting the global features into a support vector machine to determine whether the image to be detected is a tampered image.

In this embodiment, after the global Vector is obtained, the global feature is input into an SVM (Support Vector Machine), and the image to be detected is subjected to secondary classification by the Support Vector Machine, so as to determine whether the image to be detected is a tampered image.

Further, in an embodiment, step S120 includes:

In this embodiment, each image block to be detected is input into the pre-trained local descriptor, an output result of the last convolutional layer in the pre-trained local descriptor is used as a local feature corresponding to each image block to be detected, so as to extract a feature b (x, y) for each image block, and obtain a feature f (x, y) with a size of 5 × 5 × 16 (a 5 × 5 matrix with 16 channels).

The tampered image detection method provided by the embodiment obtains an image to be detected, and performs image blocking on the image to be detected based on a first preset blocking rule to obtain a plurality of image blocks to be detected; then, extracting the features of each image block to be detected through a pre-trained local descriptor to obtain the local features corresponding to each image block to be detected; then, performing feature fusion on each local feature to obtain a global feature corresponding to the image to be detected; and then inputting the global features into a support vector machine to determine whether the image to be detected is a tampered image, blocking the image to be detected, and detecting whether the image to be detected is the tampered image through the image block to be detected, so that the accuracy of feature extraction is improved, and the detection efficiency and accuracy of the tampered image are improved.

A second embodiment of the tampered image detection method of the present invention is proposed based on the first embodiment, and in this embodiment, the step S130 includes:

step S131, based on each image block to be detected, performing aggregation operation on each local feature to obtain a feature image, wherein the local feature comprises features of a plurality of channels;

step S132, image blocking is carried out on the characteristic image based on a second preset blocking rule so as to obtain a plurality of characteristic blocks;

step S133, respectively performing block pooling on the feature blocks of each channel to obtain pooled feature vectors corresponding to each channel;

step S134, determining a target feature vector based on each pooled feature vector, and taking the target feature vector as the global feature.

In this embodiment, after the local features of the image blocks to be detected are obtained, aggregation operation is performed on the local features based on the image blocks to be detected to obtain a feature image, where the local features include features of multiple channels, and specifically, the local feature of each image block b (x, y) is a feature f (x, y) with a size of 5 × 5 × 16 (a 5 × 5 matrix with 16 channels). Then, all f (x, y) are arranged in the same spatial index as b (x, y), and are aggregated into one (5 · R) × (5 · C) × 16 feature image.

Then, image segmentation is performed on the feature image based on a second preset segmentation rule to obtain a plurality of feature blocks, specifically, the feature image is subjected to image segmentationh×wPartitioning to obtain a plurality of feature blocks, wherein the size of each feature block is as follows:

wherein the content of the first and second substances,

is a rounded down function.

Respectively carrying out block pooling on the feature blocks of each channel to obtain pooled feature vectors corresponding to each channel, specifically, pooling the feature blocks corresponding to each channel of the feature image to obtain pooled feature vectors corresponding to each channel, wherein the pooling formula is as follows:

Z_k=[pool(g^k(1,1)),…,pool(g^k(h,w))]；

where k ∈ [1,16], pool (·) denotes a pooling function, i.e., a maximum or mean function.

Finally, a target feature vector is determined based on each pooled feature vector and is used as the global feature, specifically, 16 pooled feature vectors are connected in series according to the following formula,

；

wherein the content of the first and second substances,

i.e. the target feature vector, i.e. the global feature.

In the method for detecting a tampered image, aggregation operation is performed on each local feature based on each image block to be detected to obtain a feature image, where the local feature includes features of multiple channels; then, carrying out image blocking on the characteristic image according to a second preset blocking rule to obtain a plurality of characteristic blocks; then, respectively carrying out block pooling on the feature blocks of each channel to obtain pooled feature vectors corresponding to each channel; and then, determining a target feature vector based on each pooled feature vector, taking the target feature vector as the global feature, and accurately obtaining the global feature through the local feature, thereby further improving the detection efficiency and accuracy of the tampered image. Meanwhile, fixed h multiplied by w partitioning is adopted for feature images with different sizes, so that the dimension of a final output feature vector is kept unchanged, local geometric information of the feature images can be better utilized, a partitioning pooling strategy is adopted for the feature images, partitioning is firstly carried out on each channel of the feature images output by a network, pooling (taking the maximum value or the mean value) is carried out on each block in space, then the result of each channel is fused into one feature vector, so that local noise brought by JPEG compression only influences the pooling result of the local feature blocks, namely certain dimensions of the output features, and the robustness of JPEG compressed image detection is improved.

Based on the first embodiment, a third embodiment of the tampered image detecting method of the present invention is proposed, and in this embodiment, before step S120, the tampered image detecting method further includes:

step S150, a training data set is obtained, wherein the training data set comprises spliced image blocks and real image blocks;

step S160, inputting the training data set into two sub-networks in a convolutional neural network to obtain a first loss function and a second loss function corresponding to the two sub-networks and a contrast loss function between the two sub-networks;

step S170, determining a pre-trained local descriptor based on the first loss function, the second loss function and the contrast loss function.

In this embodiment, the training data set includes a stitched image block and a real image block, that is, the training data set is composed of two image blocks, including a stitched image block (positive sample) and a real image block (negative sample). After the training data set is obtained, the training data set is input into two sub-networks in a convolutional neural network to obtain a first loss function, a second loss function and a contrast loss function between the two sub-networks, wherein the first loss function and the second loss function correspond to the two sub-networks.

It should be noted that, in order to better verify the performance of the present invention, five-fold cross validation is adopted, one fifth of the data set is taken as the test set each time, the rest is the training set, and the average value of the test results is taken as the final result after repeating five times. In the training set, 128 × 128 tampered image blocks are randomly extracted from the edges of the spliced area of the tampered image as positive samples, and an equal amount of real image blocks are extracted from any position of the real image as negative samples. The two types of samples form a training set and a verification set of the convolutional neural network, and during training, data expansion, namely turnover and rotation, is carried out on the extracted image blocks, so that overfitting is inhibited. The data batch of one iteration during network training is 128, λ is set to 0.01 in formula (7), and m is set to 10 in formula (6). After the image features are extracted by using the trained network, feature fusion is carried out by adopting a 5 multiplied by 5 blocking pooling strategy to generate a final feature with 400 dimensions. Finally, the obtained features are used for training a support vector machine taking a Radial Basis Function (RBF) as a kernel function, and the optimal (C, g) parameters of the support vector machine are determined by a grid search method.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a convolutional neural network of the present invention, where the convolutional neural network includes two sub-networks, the convolutional neural network is a symmetric dual-sub-network structure, the sub-networks include 8 convolutional layers, 2 maximum pooling layers, a full-link pass, and a classifier, and the sub-networks are sequentially connected according to the order of the two convolutional layers, one maximum pooling layer, a full-link pass, and a classifier. Each convolutional layer is followed by a batch of normalized BN and linear rectified ReLU activation functions, so that non-linearity is introduced while gradient explosion and dispersion are prevented. To prevent overfitting, a dropout technique is introduced at the fully connected level, i.e., neural nodes are randomly rendered inoperative with a 50% probability. To increase feature diversity, the three channels of convolution kernels in the first convolution layer are initialized with different filters, e.g., the high-pass filters in the spatial enrichment model SRM, i.e., a specific combination of 30 high-pass filters in the SRM. Moreover, the filters should be from the same group of filters, so that the residual features of the three channels after convolution have similar statistical characteristics, and the network can conveniently model the high-order statistics.

Referring to FIG. 5, a convolution kernel W_j=1…28Using first five sets of filters (F)_{j∈Ci(i=1…5)}) The initialization is performed, in particular,

;

since the sixth and seventh groups of filters have the same square symmetry, they are used to initialize the convolution kernel W_j=29,30Specifically:

W₂₉=[F₂₉F₃₀F₂₉];

W₃₀=[F₃₀F₂₉F₃₀];

since the optimization strategy (such as gradient descent) commonly used in the neural network affects the high-pass filtering characteristic of the filter in the first convolutional layer when the convolutional kernel weight is updated, in order to maintain the high-pass filtering characteristic of the first convolutional layer in the network training and facilitate the extraction of the detail information of the image, the constraint conditions of the weight of the first convolutional layer are as follows:

；

wherein the content of the first and second substances,

representing the convolution kernel W at the nth iteration_jWeight of the kth (k =1, 2, 3) channel. Since the high-pass filter in each spatial enrichment model in the initial state (n = 0) satisfies the above formula, it can be guaranteed that the weight of the first convolutional layer in the iteration satisfies the following formula,

；

wherein the content of the first and second substances,

is composed of

E (-) is the expectation function.

And finally, after the first loss function, the second loss function and the contrast loss function are obtained, determining a pre-trained local descriptor according to the first loss function, the second loss function, the contrast loss function and the convolutional neural network.

Further, in an embodiment, step S160 includes:

step S161, determining the feature vectors of the training samples corresponding to the training data set, and determining Euclidean distances corresponding to the training samples based on the feature vectors, the spliced image blocks and the real image blocks;

step S162, determining the contrast loss function based on the euclidean distance.

In this embodiment, the feature vector of the training sample corresponding to the training data set, that is, the output result of the last convolutional layer, is determined first, and then, based on each feature vector, the spliced image block, and the real image block, the euclidean distance corresponding to each training sample is determined; determining the contrast loss function based on the euclidean distance, in particular,

;

;

wherein the content of the first and second substances,

for the contrast loss function, N is the training batch,

is the euclidean distance between the two feature vectors,

and

from the nth input sample pair (labeled as

And

) Extracted feature vectors, m is a constant that controls d. Indicating function

If x = i, otherwise

. In the training process, the contrast loss function tends to minimize the Euclidean distance between two similar sample features and make the Euclidean distance between two heterogeneous sample features larger than a preset constant m.

In the method for detecting a tampered image, a training data set is obtained, wherein the training data set comprises spliced image blocks and real image blocks; then inputting the training data set into two sub-networks in a convolutional neural network to obtain a first loss function and a second loss function corresponding to the two sub-networks and a contrast loss function between the two sub-networks; and then, determining a pre-trained local descriptor based on the first loss function, the second loss function and the contrast loss function, accurately determining the pre-trained local descriptor according to a training data set, and improving the feature expression capability of the local descriptor by adding the contrast loss function, so that the features extracted by the pre-trained local descriptor for the image blocks of the same type are similar, and the features extracted for the image blocks of different types are completely different.

The convolutional neural network is a symmetrical double-sub network structure, not only is the cross entropy loss function of each sub network used for carrying out two-class training, but also the contrast loss function between the sub networks is used for improving the feature expression capability. The first convolutional layer of the convolutional neural network is initialized by adopting a specific combination of 30 high-pass filters in a space enrichment model (SRM), and the weight is updated through a learning strategy with constraint, so that the influence of image content is inhibited, and residual features are concentrated in extraction.

A fourth embodiment of the tampered image detection method of the present invention is proposed based on the third embodiment, and in this embodiment, the step S170 includes:

step S171 of determining a target loss function based on the first loss function, the second loss function, and the contrast loss function;

step S172, when the target loss function meets a preset condition, updating the convolutional neural network based on the target loss function, and determining a pre-trained local descriptor based on the updated convolutional neural network;

and step S173, when the target loss function does not meet the preset condition, updating the convolutional neural network based on the target loss function, taking the updated convolutional neural network as the convolutional neural network, and returning to execute the step of inputting the training data set into two sub-networks in the convolutional neural network.

In this embodiment, a target loss function is determined according to the first loss function, the second loss function, and the contrast loss function, and a calculation formula of the target loss function is as follows:

J=0.5*J_s1+0.5*J_s2+λ*J_c；

wherein, J_s1Is a first loss function, J_s2J is the target loss function and λ is the weighting coefficient of the contrast loss function.

Then, whether a target loss function meets a preset condition is judged, for example, whether the target loss function converges, when the target loss function meets the preset condition, the convolutional neural network is updated based on the target loss function, a pre-trained local descriptor is determined based on the updated convolutional neural network, when the target loss function does not meet the preset condition, the convolutional neural network is updated based on the target loss function, the updated convolutional neural network is used as the convolutional neural network, and the step S120 is executed in a return manner.

The target gradient can be determined according to the target loss function, and the convolutional neural network can be updated according to the target gradient.

In the method for detecting a tampered image, a target loss function is determined based on the first loss function, the second loss function and the contrast loss function; then when the target loss function meets a preset condition, updating the convolutional neural network based on the target loss function, and determining a pre-trained local descriptor based on the updated convolutional neural network; and then when the target loss function does not meet the preset condition, updating the convolutional neural network based on the target loss function, taking the updated convolutional neural network as the convolutional neural network, and returning to the step of executing two sub-networks for inputting the training data set into the convolutional neural network, so that a pre-trained local descriptor meeting the preset condition can be obtained, and the efficiency and accuracy of image tampering detection are further improved.

On the basis of the above-described respective embodiments, a fifth embodiment of the tampered image detection method of the present invention is proposed, in which after step S140, the tampered image detection method further includes:

step S180, if the image to be detected is a tampered image, acquiring a target sub-network in the pre-trained convolutional neural network based on a pre-trained local descriptor;

step S190, inputting each image block to be detected into the target subnetwork respectively to obtain the tampering probability corresponding to each image block to be detected;

s200, determining a prediction label graph corresponding to the image to be detected based on the tampering probability, and performing mean filtering on the prediction label graph to obtain a continuous splicing probability graph;

and S210, carrying out splicing image positioning on the image to be detected based on the continuous splicing probability map.

In this embodiment, when an image to be detected is a tampered image, a pre-trained convolutional neural network corresponding to a pre-trained local descriptor is obtained first, and a target sub-network is determined according to the pre-trained convolutional neural network, specifically, the target sub-network is any sub-network in the pre-trained convolutional neural network, and the softmax classifier in the target sub-network outputs a probability value.

And then, respectively inputting each image block to be detected into the target sub-network to obtain the corresponding tampering probability of each image block to be detected, namely outputting the tampering probability corresponding to each image block to be detected by the softmax classifier in the target sub-network.

Then, determining a prediction label map corresponding to the image to be detected based on the tampering probability, wherein a calculation formula of a prediction label value L (x, y) at a pixel coordinate (x, y) corresponding to each image block to be detected in the prediction label map is as follows:

；

wherein [ P ]]Is an Affsen bracket ([ P ]]=1 if condition P is true, otherwise [ P]=0)，l_kFor the probability of tampering for the K-th image block containing pixel (x, y), K being the number of these image blocks, the optimal value of the threshold τ is derived from the training data.

After obtaining a prediction label graph, carrying out mean value filtering on the prediction label graph to obtain a continuous splicing probability graph; for example, a mean filtering with a window size of 64 × 64 is applied to the prediction label map, resulting in a continuous stitching probability map with pixel values in the [0,1] interval.

And then, carrying out splicing image positioning on the images to be detected based on the continuous splicing probability map.

In the method for detecting a tampered image, if the image to be detected is a tampered image, a target sub-network in a pre-trained convolutional neural network is obtained based on a pre-trained local descriptor; then, inputting each image block to be detected into the target sub-network respectively to obtain the corresponding tampering probability of each image block to be detected; then, determining a prediction label graph corresponding to the image to be detected based on the tampering probability, and carrying out mean value filtering on the prediction label graph to obtain a continuous splicing probability graph; and then carrying out splicing image positioning on the image to be detected based on the continuous splicing probability map, and further carrying out splicing image positioning on the image to be detected through a convolutional neural network so as to position a tampered area in the image to be detected.

A sixth embodiment of the tampered image detection method of the present invention is proposed based on the fifth embodiment, and in this embodiment, step S210 includes:

step S211, deducing a continuous splicing probability graph based on a conditional random field model to obtain a tampering splicing probability corresponding to each pixel point in the continuous splicing probability graph;

and step S212, determining target tampered pixel points in all the pixel points based on the tampered splicing probability.

In this embodiment, the continuous splicing probability map is inferred by the conditional random field model, so that the continuous splicing probability map is subjected to pixel-level classification by the conditional random field model to obtain label allocation of each pixel, and a tampering splicing probability corresponding to each pixel point in the continuous splicing probability map is determined according to the label allocation.

Specifically, an energy function is minimized based on an average field approximation method, an energy function of the pixel i is determined, label distribution of each pixel is obtained according to the energy function, and the formula of the energy function is as follows:

；

where li is the label assignment for pixel i,

for a monadic potential data item based on a category score,

is a binary potential energy smoothing item based on local pixel connection relation. Wherein the content of the first and second substances,

；

wherein the content of the first and second substances,

is a two-dimensional feature vector for pixel i. Binary potential energy

By tag compatibility function

And Gaussian kernel function

And (4) forming.

For introducing penalties when neighboring pixels are assigned different labels.

Depending on the characteristics f (pixel value and location) of the pixels i and j, is a bilateral filter term (weight w)₁) And spatial terms (weight w)₂) Is of the form:

；

where I and p represent pixel values and pixel positions, respectively. Hyperparameter theta_α，θ_βAnd theta_γThe spatial proximity of a pair of pixels and the degree of tag similarity are controlled.

According to the tampered image detection method provided by the embodiment, a continuous splicing probability map is deduced based on a conditional random field model, so that the tampered splicing probability corresponding to each pixel point in the continuous splicing probability map is obtained; and then determining target tampering pixel points in all the pixel points based on the tampering and splicing probability, refining the positioning result by adopting a fully-connected conditional random field model, and secondarily refining the rough positioning result generated by the convolutional network by utilizing the relevance among the pixels, thereby improving the accuracy and precision of image tampering positioning.

The application adopts a plurality of public data sets for training and testing, including three common data sets of CASIA v2.0, Columbia gray DVMM and DSO-1. For the DVMM data set, because the images contained in the DVMM data set are 128 multiplied by 128, the convolutional neural network provided by the invention can be directly utilized for end-to-end detection, and feature extraction and SVM classification are not needed. For CASIA v2.0 and DSO-1 data sets, images contained in the data sets are larger than 128 x 128, and the image splicing detection scheme provided by the invention is adopted for detection. In order to verify the effectiveness of the key technology in the invention, the splicing detection performance when different convolutional networks are used as local descriptors is firstly compared, and the networks are specifically described as follows:

SRM-CNN: the first convolutional layer is initialized directly with a spatial enrichment model (SRM), and the loss function is a cross-entropy loss function.

ISRM-CNN: the first convolutional layer is initialized by the initialization strategy provided by the application, and the loss function is a cross entropy loss function.

C _ ISRM-CNN: on the basis of ISRM-CNN, the learning strategy provided by the invention is added to the first convolution layer, and the loss function is a cross entropy loss function.

C _ ISRM _ C-CNN: the contrast loss function in the application is added on the basis of C _ ISRM-CNN.

FIG. 6 shows the splice detection accuracy (ratio of the number of classified correct samples to the total number of samples in%) of the four convolutional networks when the pooling function pool (-) in feature fusion is a maximum function on the CASIA v2.0 and DSO-1 datasets. As can be seen from fig. 6, each key technology in the convolutional network proposed in the present application can bring a gain to the performance.

In order to evaluate the robustness of the splicing detection scheme under the JPEG compression attack, the images in the data set are subjected to JPEG compression, and the Quality Factors (QF) are 95, 85, 75 and 65. To avoid the second JPEG compression, only TIFF format images were selected for the CASIA v2.0 dataset to be compressed. To fully evaluate performance, tests were performed based on the following two settings:

setting # 1: the training set is a non-compressed image and the testing set is a compressed image.

Setting # 2: both the training set and the test set are compressed images.

Under these two settings, the detection performance of the present invention is shown in fig. 7, where Max and Mean represent the pooling functions pool (-) of the feature fusion step as the maximum function and the Mean function, respectively.

The image stitching positioning performance index adopts an F1 score which is between 0 and 1, the larger the index is, the better the positioning performance is, and the calculation mode is as follows:

wherein TP, FN, FP represent true positive number, false negative number and false positive number respectively. The splice location performance using different network structures based on setting #1 on DSO-1 test sets of different JPEG quality factors is shown in fig. 8. It can be seen that the convolutional neural network (C _ ISRM _ C-CNN) proposed by the present invention has advantages in both positioning performance and robustness.

In addition, the stitching positioning results of some test images are shown in fig. 4. The black pixels represent splicing tampered areas, and the white pixels represent non-tampered areas, so that the splicing positioning scheme can effectively position the image splicing tampered areas and generate a prediction result close to a real label.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a tampered image detection program is stored on the computer-readable storage medium, and when executed by a processor, the tampered image detection program implements the following operations:

Further, the tamper image detection program when executed by the processor further performs the following operations:

determining the contrast loss function based on the Euclidean distance.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for detecting a tampered image, the method comprising the steps of:

2. The tamper image detection method according to claim 1, wherein the step of performing feature extraction on each image block to be detected through a pre-trained local descriptor to obtain a local feature corresponding to each image block to be detected includes:

3. The tamper image detection method according to claim 1, wherein the step of performing feature fusion on each local feature to obtain a global feature corresponding to the image to be detected includes:

4. The method for detecting falsified images according to claim 1, wherein before the step of extracting features of each image block to be detected by using a pre-trained local descriptor, the method for detecting falsified images further comprises:

5. The tamper image detection method of claim 4, wherein the step of determining a pre-trained local descriptor based on the first loss function, the second loss function, and the contrast loss function comprises:

6. The tamper image detection method of claim 4, wherein the step of inputting the training data set into two sub-networks in a convolutional neural network to obtain a contrast loss function between the two sub-networks comprises:

determining the contrast loss function based on the Euclidean distance.

7. The method for detecting falsified images according to any one of claims 1 to 6, wherein after the step of inputting the global features into a support vector machine to determine whether the image to be detected is a falsified image, the method for detecting falsified images further comprises:

8. The method for detecting falsified images according to claim 7, wherein the step of performing the stitched image localization on the to-be-detected images based on the continuous stitching probability map includes:

9. A falsified image detection apparatus characterized by comprising: memory, a processor and a tamper image detection program stored on the memory and executable on the processor, the tamper image detection program when executed by the processor implementing the steps of the tamper image detection method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a falsified image detection program which, when executed by a processor, implements the steps of the falsified image detection method according to any one of claims 1 to 8.