CN115578631A

CN115578631A - Image tampering detection method based on multi-scale interaction and cross-feature contrast learning

Info

Publication number: CN115578631A
Application number: CN202211421544.6A
Authority: CN
Inventors: 高赞; 陈圣灏; 李传森; 张蕊; 李华刚; 郝敬全
Original assignee: Shandong Zhonglian Audio Visual Information Technology Co ltd; Qingdao Haier Smart Technology R&D Co Ltd; Taihua Wisdom Industry Group Co Ltd; Shandong Institute of Artificial Intelligence
Current assignee: Shandong Zhonglian Audio Visual Information Technology Co ltd; Qingdao Haier Smart Technology R&D Co Ltd; Taihua Wisdom Industry Group Co Ltd; Shandong Institute of Artificial Intelligence
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2023-01-06
Anticipated expiration: 2042-11-15
Also published as: CN115578631B

Abstract

The invention belongs to the technical field of image detection, and provides an image tampering detection method based on multi-scale interaction and cross-feature contrast learning, which can be used for positioning an image counterfeiting data set and an inharmonious data set. The method comprises the following steps: constructing an input image, and inputting an image to be positioned into a backbone network to extract features; the multi-scale features are interacted to obtain multi-stage features; setting pixels of the inharmonic region as positive examples, setting background pixels as negative examples, simultaneously selecting positive and negative feature vectors from each feature size, randomly sampling the negative examples according to the number of the positive examples, carrying out contrast learning loss constraint on the feature vectors obtained by sampling, and simultaneously carrying out contrast learning loss constraint on the feature of multiple stages by mixing the positive and negative examples; fusing every two adjacent features, and completing the fusion of the features through the contraction attention of the features; and (4) performing joint training on multiple loss functions.

Description

Image tampering detection method based on multi-scale interaction and cross-feature contrast learning

Technical Field

The invention relates to an image tampering detection method based on multi-scale interaction and cross-feature comparison learning, and belongs to the technical field of image detection.

Background

With the improvement of the living standard of people, multimedia has penetrated into various fields, and digital images have become important carriers for media propagation. However, with the advent of more and more image editing tools, it has become easier to manipulate images. Compared with a complex tampering method, simple counterfeiting technologies such as image splicing and the like are huge in quantity and most widely applied, the simple splicing technology often causes illumination statistical inconsistency between an image counterfeiting area and an integral area due to a camera, the image counterfeiting area and the integral area are called inharmonious areas, the inharmonious images are increasingly generated and rapidly spread along with the development of the Internet, the dominance is achieved, in the current detection method for the image counterfeiting field, due to the fact that the tampering types are various, the tampering modes are continuously iterated, common information about counterfeiting priors in general tampering is difficult to effectively define, the inharmonious images with obvious color difference clues are expected to be simplified, positioning is only carried out on the inharmonious images with the obvious color difference clues, and the common information of the inharmonious counterfeiting area is searched for carrying out positioning on the inharmonious counterfeiting area through the most extensive and most easily-generated color counterfeiting difference information at present. Meanwhile, the image tampering positioning is further assisted to achieve excellent performance, and the image inharmonious positioning can be said to be one of subtasks of the image tampering positioning. The existing method uses multi-scale information to mine the dissonant region, but the method is only an extension of semantic segmentation and is not designed for the dissonant positioning task, and the method also increases the difference between the image foreground and the background according to illumination inconsistency, but the characteristic extraction positioning network is not improved at all, and the dissonant region is still generated through pre-operation and is also a forged region in nature, and certain common information exists among the regions, so that the information is hoped to be explored for positioning.

Disclosure of Invention

The invention aims to provide an image tampering detection method based on multi-scale interaction and cross-feature contrast learning, which can effectively position an image forged data set and an inharmonious data set.

In order to achieve the purpose, the invention is realized by the following technical scheme:

an image tampering detection method based on multi-scale interaction and cross-feature contrast learning comprises the following steps:

s1, constructing an input image, inputting an image to be positioned into a backbone network, and extracting characteristics:

randomly adding image jitter to an original image at the same time, using the image jitter as the input of a backbone network, sharing images with different scales and network weights for feature extraction, putting the images into the backbone network, and extracting image features in four stages respectively, wherein each stage generates three large and small features under the condition of three inputs;

s2, multi-scale feature interaction to obtain multi-stage features:

performing multi-scale feature interaction on the three features of each stage, performing down-sampling on the large features, performing up-sampling on the small features to obtain the same size, and performing multi-scale weight constraint addition to obtain final features;

s3, cross-feature comparison learning:

setting pixels of an inharmonic region in the group Truth as a positive example, setting background pixels as a negative example, simultaneously down-sampling the group Truth to each feature size to select positive and negative feature vectors, randomly sampling negative samples according to the number of the positive samples, performing contrast learning loss constraint on the feature vectors obtained by sampling, and performing contrast learning loss constraint on the feature vectors obtained by sampling while mixing the positive and negative samples on multi-stage features;

4) Feature shrinkage fusion decoding:

fusing every two adjacent features, and completing the fusion of the features through the contraction attention of the features;

5) Multi-loss function joint training:

and finally, decoding to obtain a final predicted image, performing pixel-level loss supervision on the final predicted image and the GroudTruth, and performing a combined training optimization network with contrast learning.

On the basis of the image tampering detection method based on multi-scale interaction and cross-feature contrast learning, the image extraction feature construction is specifically as follows:

the method comprises the steps of adjusting the size of an image, randomly turning over the image, randomly rotating the image, adjusting the contrast ratio of the image, and then using the image as the input of a network, wherein the input sizes are respectively H multiplied by W, W is the width of a picture, H is the height of the picture, and the unit is a pixel.

On the basis of the image tampering detection method based on multi-scale interaction and cross-feature contrast learning, the multi-scale feature interaction specifically comprises the following steps:

and performing the same operation on the three characteristics of each stage, wherein the characteristics of the 1.5x image are respectively downsampled to the size of the characteristics of the input image in an average pooling mode and a maximum pooling mode, the characteristics of the 0.5x image are upsampled to the size of the characteristics of the input image in a bilinear interpolation mode, then the characteristics are spliced, two layers of rolling blocks are performed, a softmax function is performed to automatically learn the weight of each scale characteristic, and the weighted sum is performed to obtain the characteristics after the three scales are fused.

On the basis of the image tampering detection method based on multi-scale interaction and cross-feature contrast learning, the cross-feature contrast learning specifically comprises the following steps:

sampling the GroudTruth to the same size of each feature by a nearest neighbor method; finding out feature vectors of dissonant forged pixels and background pixels on the feature map by using GroudTruth in a mapping mode, and then randomly sampling two categories in each image in each batch; setting 5 as a threshold, when the number of the feature vectors is less than 5, the category of the image is abandoned, when the number of the feature vectors is more than 5, random sampling is carried out, 5 related feature vectors are selected, all harmonic pixel feature vectors and background pixel feature vectors in the batch are combined in the same category, and finally four feature sets are obtained according to 4 features

Cross-image contrast learning is implemented in each feature set, with the contrast loss as follows:

wherein

Next, A1 and A4 and A1 and A3 are respectively subjected to cross-scale contrast learning.

On the basis of the image tampering detection method based on multi-scale interaction and cross-feature contrast learning, the feature contraction fusion decoder specifically comprises the following steps: in the four features F1, F2, F3 and F4, the final result is output through convolution and up-sampling by continuously fusing the features through contraction and fusion of F1 and F2, contraction and fusion of F2 and F3, contraction and fusion of F3 and F4 in pairs.

On the basis of the image tampering detection method based on multi-scale interaction and cross-feature contrast learning, the target function construction based on the multi-scale interaction and cross-feature contrast learning network is specifically as follows:

since in anharmonic localization the samples are unbalanced and the harmonious region is smaller than the anharmonic region, whereby the pixel supervision loss consists of dice and focal, the pixel segmentation loss and the contrast loss are combined, whereby the total loss function is located as follows:

wherein G represents the group Truth,

a predictive picture representing each of the stages is shown,

representing the probability value of the pixel prediction,

the user-defined parameters are represented by a table,

the setting is 1 and the setting is carried out,

the setting was made to be 0.3,

the union is represented as a union of the sets,

the union is represented as a union of the sets,

for comparison loss.

The invention has the advantages that:

the multi-scale information is combined together, common characteristics of illumination forgery are explored through contrast learning, effective fusion is carried out on the characteristics through layer-by-layer scaling in decoding, and the potential relation between the inharmonious region and the inharmonious region is fully excavated.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a block diagram of the present invention;

FIG. 2 is a model performance display of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

As shown in fig. 1, it is an operation flowchart of the image tampering detection method based on multi-scale interaction and cross-feature contrast learning according to the present invention, and the implementation steps of the method are as follows:

step one, constructing an input image, inputting the image to be positioned into a backbone network and extracting characteristics

The method comprises the following specific operations of constructing multiple scales, wherein an input image is input as a network after being subjected to size adjustment, random overturning, random rotation and contrast adjustment, the input size is respectively H multiplied by W, the W is the picture width, the H is the picture height, the unit is a pixel, in the input process, in order to find the difference between the scales, the input image is divided by 0.5 to obtain a low-resolution image, meanwhile, the input image is multiplied by 1.5 to obtain a high-resolution image, the three different pixels and the different scale images are jointly input into a backbone network swantransformer, feature extraction is carried out through an attention layer and an MLP layer, the method is different from the mode of extracting features by convolution of a predecessor, global modeling can be achieved, in the process of inharmonious localization, the fact that local inconsistent information is found through the global modeling is important, the features of different stages are extracted, parameters are shared at the same time, and finally, the features with three different scales are generated in four extraction stages.

Step two, obtaining multi-stage characteristics through multi-scale characteristic interaction

The three characteristics of each stage are subjected to the same operation, wherein the characteristics of a 1.5x image are sampled to the size of the characteristics of an input image in an average pooling and maximum pooling adding mode, meanwhile, the characteristics of a 0.5x image are sampled to the size of the characteristics of the input image in a bilinear interpolation mode, then, the characteristics are spliced, two layers of convolution blocks are carried out, a softmax function is carried out to automatically learn the weight of each scale characteristic, the characteristics of each scale are subjected to weighted summation to obtain the characteristics after three scales are fused, and in training, the characteristics fuse effective information on the three scales.

Step three, cross-feature comparison learning

The significance of the contrast learning lies in that feature vectors of the same class are drawn closer, feature vectors of different classes are drawn farther, two categories of forged inharmonic pixels and background pixels exist in inharmonic localization, and the universality of key information of pixels with forged colors is sought to be found out, and particularly, the significance of the contrast learning is that the popularity of the key information of the pixels with forged colors is sought to be found out

Down-sampling to the same size of each feature by nearest neighbor method, then we use by means of mapping

Finding out the feature vectors of the discordant forged pixels and the background pixels on the feature map, then randomly sampling two categories in each image in each batch, setting 5 as a threshold, and when the number of the feature vectors is less than 5, determining the category of the imageAbandoning, randomly sampling when the number is more than 5, selecting 5 related feature vectors, finally combining all harmonic pixel feature vectors and background pixel feature vectors in the batch in the same category, and finally obtaining four feature sets according to 4 features

We implement cross-image contrast learning in each feature set, with the contrast loss as follows:

wherein

And

a pixel-based feature vector in the representation feature,

in order to fix the temperature coefficient,

in order to belong to the pixels of the anharmonic region,

are pixels belonging to the background area.

Next, the A1 and the A4 and the A1 and the A3 are respectively subjected to cross-scale contrast learning, and taking the A1 and the A4 as examples, the feature vectors of the positive and negative samples in the A1 and the A4 sets are respectively integrated and subjected to contrast learning, so that the cross-scale contrast learning is realized.

Step four, feature contraction fusion decoding

In the four features F1, F2, F3 and F4, the final predicted feature obtained by fusing F1 and F2, F2 and F3, and F3 and F4 in a shrinking way is continuously fused to be a feature, and finally a final result is output by convolution upsampling, wherein taking F4 and F3 as examples, the F4 is firstly upsampled to obtain the same feature size as F3, feature multiplication is carried out to obtain F34, the F34 is respectively added with F3 and F4 to obtain F3'F4', then the two features are spliced to excavate channel information through channel attention to obtain fused information, and meanwhile, the feature is fused to carry out auxiliary supervision loss, so that the final predicted feature obtained by fusing predicted images is output by convolution and bilinear upsampling.

Step five, multi-loss function joint training

Since in anharmonic localization the samples are unbalanced and the harmonious region is smaller than the anharmonic region, the pixel supervision loss is thus composed of diceloss and focalloss, the pixel segmentation loss and the contrast loss are combined, whereby the total loss function is located as follows:

wherein G represents

，

A predictive picture representing each of the stages is shown,

representing the probability value of the prediction of the pixel,

a union is represented that is a union of the two,

to compare the loss, the final prediction map is constrained by this loss function.

In order to verify the effectiveness of the present invention,in that

The evaluation is performed on the non-harmonious data set,

consists of four subdata sets:

. In that

And

on the data set, dissonant images are obtained by adjusting the color and illumination statistics of the foreground colors. For

And HDay2Night datasets, dissonant images are obtained by subjecting the foreground to different styles of modification or capture under different conditions from the corresponding part of the same scene. For the anharmonic images in all four sub-data sets, the foreground region appears incompatible with the background mainly due to the color and illumination inconsistencies, which allows us to use the data set to focus on the localization of the anharmonic regions. In this work we only use dissonant images and not pairs of harmonious images. One problem is that the anharmonic regions in the anharmonic image may be blurred, since the background may also be considered as an anharmonic region. To avoid blurring we directly discard images with foreground areas larger than 50%, which only account for around 2% of the entire data set. The strategy is similar to the traditional image processing positioning method, and the task is to position the uncoordinated area which occupies less than 50 percent of the area. We cut the training set and test set to 64255 and 7237 sheets, respectively. For quantitative evaluation, we calculated the average accuracy AP, F1 score, and IOU as evaluation indices based on the predicted mask M and the ground truth mask according to the previous correlation method.

The performance comparison of the classical image inharmonious positioning algorithm and the method is shown in the following table, 30 epochs are set in an experiment, an optimization method Adam is adopted, the default learning rate is 1e-4, and the optimization method adopts a poly learning rate attenuation strategy; loss function hyperparametric settings to

(ii) a In order to enhance the fitting capability of a model to target domain data, random contrast enhancement, illumination enhancement, saturation enhancement and turning operation are adopted; and restoring the image to the original image for testing when the trained model is tested.

The following table is a comparison of the performance of the classical image dissonance localization algorithm with the present invention on different datasets:

from the table, it can be found that the model effect of the model obtains excellent performance in AP, F1 and IOU, and the optimal MadisNet is exceeded by improving the positioning network, so that a very high-efficiency positioning effect is obtained.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image tampering detection method based on multi-scale interaction and cross-feature contrast learning is characterized by comprising the following steps:

s2, multi-scale feature interaction to obtain multi-stage features:

s3, cross-feature comparison learning:

setting pixels of an inharmonic region in the group Truth as positive examples, setting background pixels as negative examples, simultaneously sampling the group Truth to each feature size for selecting positive and negative feature vectors, randomly sampling negative samples according to the number of the positive samples, performing comparative learning loss constraint on the feature vectors obtained by sampling, and performing comparative learning loss constraint on the mixture of the positive and negative samples on multi-stage features;

4) Feature shrinkage fusion decoding:

5) And (3) multi-loss function joint training:

and finally, decoding to obtain a final predicted image, performing pixel-level loss supervision on the predicted image and the GroudTruth, and performing combined training optimization network with contrast learning.

2. The image tampering detection method based on multi-scale interaction and cross-feature contrast learning according to claim 1, characterized in that: the image extraction feature construction is specifically as follows:

the method comprises the steps of adjusting the size of an image, randomly turning over the image, randomly rotating the image, adjusting the contrast ratio of the image, using the image as the input of a network, wherein the input size is H multiplied by W, the W is the width of the image, the H is the height of the image, and the unit is a pixel.

3. The image tampering detection method based on multi-scale interaction and cross-feature contrast learning according to claim 1, wherein the multi-scale feature interaction specifically comprises the following steps:

and performing the same operation on the three characteristics of each stage, wherein the characteristics of the 1.5x image are respectively downsampled to the size of the characteristics of the input image in an average pooling and maximum pooling addition mode, the characteristics of the 0.5x image are upsampled to the size of the characteristics of the input image in a bilinear interpolation mode, then the characteristics are spliced, two layers of convolution blocks are performed, a softmax function is performed to automatically learn the weight of each scale characteristic, and the characteristics after three scales are fused are obtained through weighted summation.

4. The image tampering detection method based on multi-scale interaction and cross-feature contrast learning according to claim 1, wherein the cross-feature contrast learning specifically comprises the following steps:

Cross-image contrast learning is implemented in each feature setThe comparative losses are as follows:

wherein

And

a pixel-based feature vector in the representation feature,

in order to fix the temperature coefficient,

in order to belong to the pixels of the anharmonic region,

to belong to the background region pixels, next, cross-scale contrast learning is performed on A1 and A4 and A1 and A3, respectively.

5. The image tampering detection method based on multi-scale interaction and cross-feature contrast learning according to claim 1, wherein the feature contraction fusion decoder is specifically as follows: in the four characteristics F1, F2, F3 and F4, the F1 and the F2, the F2 and the F3, and the F3 and the F4 are shrunk and fused pairwise, and the two characteristics are continuously fused until one characteristic is finally output through convolution and upsampling.

6. The image tampering detection method based on multi-scale interaction and cross-feature contrast learning according to claim 1, wherein the objective function construction based on the multi-scale interaction and cross-feature contrast learning network is specifically as follows:

since in anharmonic localization the samples are unbalanced and the harmonious regions are smaller than the anharmonic regions, whereby the pixel supervision loss consists of rice and focal, the pixel segmentation loss and the contrast loss are combined, whereby the total loss function is located as follows:

wherein G represents a group Truth group,

a predictive picture representing each of the stages is shown,

representing the probability value of the prediction of the pixel,

the user-defined parameters are represented by a table,

the setting is 1 and the setting is carried out,

the setting was made to be 0.3,

the union is represented as a union of the sets,

for comparison loss.