CN115909445A

CN115909445A - Face image counterfeiting detection method and related equipment

Info

Publication number: CN115909445A
Application number: CN202211412421.6A
Authority: CN
Inventors: 李硕豪; 于淼淼; 张军; 雷军; 何华; 赵翔; 彭娟; 李虹颖; 陇盛; 杨佳鑫; 尹晓晴
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2023-04-04

Abstract

The application provides a face image counterfeiting detection method and related equipment. The method comprises the following steps: acquiring a preprocessed face image to be detected, and performing conversion processing on the face image to be detected to obtain a frequency domain image of the face image to be detected; respectively inputting the frequency domain graphs of the face image to be detected and the face image to be detected into a pre-trained face image counterfeiting detection model, and outputting a detection result label; in response to the fact that the detection result label is determined to be a preset first numerical value, the face image to be detected is a forged face image, and a forged mask is generated in the face image to be detected to mark a forged area; and responding to the detection result label determined as a preset second numerical value, wherein the face image to be detected is a real face image. The characteristic that details are lost is changed by using high-frequency information of the face forged image, more forged trace clues are mined through the frequency domain image of the face image to be detected, the generalization of the detection performance is improved through the face image forged detection model, and the detection reliability is guaranteed.

Description

Face image counterfeiting detection method and related equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a face image forgery detection method and related devices.

Background

With the remarkable success of face forgery technologies represented by deep forgery technology (deep fake), face forgery detection has attracted much attention. However, most of the existing detection methods based on deep learning mostly rely on stacked convolution and designed networks, which are not enough in learning generalization feature representation. The detection and identification of data sets that have not been present in the training set cannot be handled well.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method and related apparatus for detecting face image forgery, so as to solve or partially solve the above technical problems.

Based on the above purpose, a first aspect of the present application provides a method for detecting face image forgery, including:

acquiring a preprocessed face image to be detected, and converting the face image to be detected to obtain a frequency domain image of the face image to be detected;

respectively inputting the face image to be detected and the frequency domain graph of the face image to be detected into a pre-trained face image counterfeiting detection model, and outputting a detection result label;

responding to the situation that the detection result label is determined to be a preset first numerical value, enabling the face image to be detected to be a fake face image, and generating a fake mask in the face image to be detected to mark a fake area;

and responding to the fact that the detection result label is determined to be a preset second numerical value, and enabling the face image to be detected to be a real face image.

Optionally, the acquiring the preprocessed human face image to be detected includes:

acquiring an original face image to be detected;

detecting and positioning a face region of an original face image to be detected through a multitask convolutional neural network to obtain a face region image;

and adjusting the face area according to a preset rule to obtain the preprocessed face image to be detected.

Optionally, the converting the face image to be detected to obtain a frequency domain diagram of the face image to be detected includes:

and extracting the frequency domain characteristics of the face image to be detected to obtain a frequency domain image of the face image to be detected.

Optionally, the face image forgery detection model is obtained by performing pre-training through the following processes:

acquiring a preprocessed pre-training face image set, and performing conversion processing on the pre-training face image set to obtain a frequency domain image set of the pre-training face image set;

respectively inputting the pre-training face image set and a frequency domain image set of the pre-training face image set into pre-training models which are constructed in advance, wherein the pre-training models comprise a feature extraction model, a pixel level segmentation supervision model, a local patch relation supervision model and an image level classification supervision model, and the local patch relation supervision model comprises an inter-patch consistency supervision model and an inter-patch similarity supervision model;

respectively carrying out feature extraction on the pre-training face image set and the frequency domain image set of the pre-training face image set through the feature extraction model to obtain a feature map;

merging all the feature maps by using the pixel level segmentation supervision model to obtain a merged feature map, and performing pixel level segmentation on the merged feature map to obtain a pixel level segmentation loss function;

obtaining an inter-patch consistency loss function through the inter-patch consistency supervision model based on the feature graph, obtaining an inter-patch similarity loss function through the inter-patch similarity supervision model based on the feature graph, and performing summation processing on the inter-patch consistency loss function and the inter-patch similarity loss function to obtain a patch-level relation supervision loss function;

obtaining an image-level classification loss function through the image-level classification supervision model based on the feature map;

summing the pixel-level segmentation loss function, the patch-level relation supervision loss function and the image-level classification loss function to obtain a mixed loss function;

and continuously adjusting parameters of the pre-training model based on the mixed loss function until the mixed loss function is minimized to obtain a trained pre-training model, and taking the trained pre-training model as the face image forgery detection model.

Optionally, the merging all the feature maps by using the pixel-level segmentation supervision model to obtain a merged feature map includes:

and utilizing the pixel level segmentation supervision model to perform upsampling on all the feature maps to obtain feature maps with the same specification, and combining all the feature maps with the same specification to obtain a combined feature map.

Optionally, the performing pixel-level segmentation on the merged feature map to obtain a pixel-level segmentation loss function includes:

marking each pixel of the real face image in the combined feature map according to a preset first pixel value by using the pixel-level segmentation monitoring model to obtain a first segmentation area, and marking each pixel of the fake face image in the combined feature map according to a preset second pixel value to obtain a second segmentation area;

generating a pre-training fake mask based on the first divided area and the second divided area, and labeling fake face images in the combined feature map through the pre-training fake mask;

and obtaining the pixel set segmentation loss function according to the pre-training forged mask and the combined feature map.

Based on the same inventive concept, a second aspect of the present application provides a face image forgery detection apparatus, including:

the data acquisition module is configured to acquire a preprocessed face image to be detected and convert the face image to be detected to obtain a frequency domain image of the face image to be detected;

the face image detection module is configured to input the face image to be detected and the frequency domain graph of the face image to be detected into a pre-trained face image forgery detection model respectively and output a detection result label;

the first discrimination module is configured to respond to the fact that the detection result label is a preset first numerical value, the face image to be detected is a forged face image, and a forged mask is generated in the face image to be detected to mark a forged area;

and the second judging module is configured to respond to the second value which is determined that the detection result label is preset, and the face image to be detected is a real face image.

Based on the same inventive concept, a third aspect of the present application provides an electronic device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of the first aspect when executing the program.

Based on the same inventive concept, a fourth aspect of the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect.

From the above, according to the face image forgery detection method and the related device provided by the application, the face image to be detected is subjected to conversion processing to obtain the frequency domain image of the face image to be detected, the characteristic that details are lost is changed by using the high-frequency information of the face forgery image, more forgery trace clues are mined through the frequency domain image of the face image to be detected, then the frequency domain image of the face image to be detected and the frequency domain image of the face image to be detected are respectively input into the trained face image forgery detection model, the detection result label is output, the generalization of the detection performance is improved by using the face image forgery detection model, the detection reliability is ensured, when the detection result label is a preset first value, the face image to be detected is a forgery face image, a forgery mask is generated in the face image to be detected to label a forgery area, and when the detection result label is a preset second value, the face image to be detected is a real face image.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a face image forgery detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a human face image forgery detection framework according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a pre-trained spurious mask in accordance with an embodiment of the present application;

FIG. 4 is a schematic diagram of a spatial attention model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an interactive attention model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a face image forgery detection apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of an electronic device according to an embodiment of the application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to specific embodiments and the accompanying drawings.

It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the present application is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item preceding the word comprises the element or item listed after the word and its equivalent, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

Deep learning detection methods relying on stacked convolution and designed networks are often adopted in the related art, however, the networks are insufficient in learning generalization feature representation and cannot well cope with detection of data which does not appear in a training set.

The embodiment of the application provides a face image counterfeiting detection method and related equipment, wherein a face image counterfeiting detection model is used for detecting an input face image to be detected and a frequency domain image of the face image to be detected, and a detection result label is output, so that the situation that more counterfeiting trace clues cannot be mined due to the fact that high-frequency information of the face counterfeiting image changes and details are lost is avoided, the generalization of the detection performance is improved, and the detection reliability is guaranteed.

As shown in fig. 1, the method includes:

step 101, obtaining a preprocessed face image to be detected, and performing conversion processing on the face image to be detected to obtain a frequency domain image of the face image to be detected.

In this step, the face image to be detected is an image of RGB color mode, where the RGB color mode represents various colors obtained by changing three color channels of red (R), green (G), and blue (B) and superimposing them on each other.

Because the face forged image shows the frequency statistical characteristic which is not obvious to the real image, the high-frequency information of the forged image changes and the details are lost, so that the face image to be detected is converted into the frequency domain image of the face image to be detected, and more forged trace clues are mined.

In some embodiments, in step 101, the acquiring a preprocessed face image to be detected includes:

step A1, obtaining an original human face image to be detected.

And step A2, detecting and positioning the face area of the original face image to be detected through a multitask convolutional neural network to obtain a face area image.

And step A3, adjusting the face area according to a preset rule to obtain the preprocessed face image to be detected.

In the above scheme, the original face image to be detected is a single frame image subjected to video decomposition, and a Multi-task Convolutional neural network (MTCNN) is used to detect and position a face region of the original face image to be detected, so as to obtain a face region image, wherein the Multi-task Convolutional neural network is composed of a three-layer network architecture. Wherein, three layer network architecture includes: a P-Net layer, an R-Net layer and an O-Net layer. The P-Net layer is a full convolution neural network and comprises three convolution layers; the R-Net layer is a convolutional neural network and comprises three convolutional layers and a full connection layer; the O-Net layer is a convolutional neural network and comprises four convolutional layers.

Extracting information coordinates which may be a face through a P-Net layer so as to process the next layer, extracting face data in an original face image to be detected through an R-Net layer according to the information coordinates of the face extracted by the P-Net layer, filtering the wrong face data through a full connection layer contained in the R-Net layer, then filtering and removing again through a Non-Maximum Suppression (NMS) algorithm to obtain filtered face data, finally extracting annotation point information in the original face image to be detected through an O-Net layer, and taking the annotation point information in the original face image to be detected and the filtered face data as a face region image obtained through detection and positioning.

And adjusting the face region image according to a preset rule, for example, extending the face region by a set multiple outwards along the width and the height, then cutting the face region, and adjusting the size of the cut image to be a uniform specification to obtain the preprocessed face image to be detected.

In some embodiments, in step 101, the converting the facial image to be detected to obtain a frequency domain map of the facial image to be detected includes:

In the scheme, the high-pass filter can be used for processing the face image to be detected, the frequency threshold value is set according to specific conditions, and the low-frequency part lower than the frequency threshold value in the face image to be detected is restrained, so that the frequency domain characteristics are extracted, and the frequency domain image of the face image to be detected is obtained.

And 102, respectively inputting the face image to be detected and the frequency domain graph of the face image to be detected into a pre-trained face image counterfeiting detection model, and outputting a detection result label.

In this step, a face image forgery detection model is used for detection, and a detection result label is output, where the detection result label represents a binary classification label, for example, if the detection result label is 1, the face image to be detected is a real face image, and if the detection result label is 0, the face image to be detected is a forged face image, and a value of the detection result label representing binary classification is not specifically limited.

In some embodiments, in step 102, the face image forgery detection model is obtained by pre-training through the following processes:

step 1021, acquiring a pre-training face image set after pre-processing, and performing conversion processing on the pre-training face image set to obtain a frequency domain image set of the pre-training face image set.

And 1022, respectively inputting the pre-training face image set and the frequency domain atlas of the pre-training face image set into pre-training models which are constructed in advance, wherein the pre-training models comprise a feature extraction model, a pixel level segmentation supervision model, a local patch relation supervision model and an image level classification supervision model, and the local patch relation supervision model comprises an inter-patch consistency supervision model and an inter-patch similarity supervision model.

And 1023, respectively carrying out feature extraction on the pre-training face image set and the frequency domain image set of the pre-training face image set through the feature extraction model to obtain a feature map.

And step 1024, merging all the feature maps by using the pixel-level segmentation supervision model to obtain a merged feature map, and performing pixel-level segmentation on the merged feature map to obtain a pixel-level segmentation loss function.

And 1025, obtaining an inter-patch consistency loss function through the inter-patch consistency supervision model based on the characteristic diagram, meanwhile, obtaining an inter-patch similarity loss function through the inter-patch similarity supervision model based on the characteristic diagram, and summing the inter-patch consistency loss function and the inter-patch similarity loss function to obtain a patch-level relation supervision loss function.

And step 1026, obtaining an image-level classification loss function through the image-level classification supervision model based on the feature map.

Step 1027, summing the pixel-level segmentation loss function, the patch-level relation supervision loss function and the image-level classification loss function to obtain a mixed loss function.

And 1028, continuously adjusting parameters of the pre-training model based on the mixed loss function until the mixed loss function is minimized to obtain a trained pre-training model, and taking the trained pre-training model as the face image forgery detection model.

In the above scheme, for example, as shown in fig. 2, an RGB facial image I (i.e., a pre-training facial image set) and a frequency domain image F (i.e., a frequency domain image set of the pre-training facial image set) are input into a pre-training model and are divided into two branches for processing, and feature extraction is performed on the RGB facial image I and the frequency domain image F respectively by a feature extraction model to obtain feature maps, where the RGB facial image I and the frequency domain image F are processed by a spatial attention module respectively, so that the pre-training model focuses on important local areas related to counterfeit face detection, and then the RGB facial image I and the frequency domain image F are processed by the spatial attention module respectively and then pass through four convolution modules respectively: conv1_ x, conv2_ x, conv3_ x and conv4_ x, each convolutional layer being followed by a BN (Batch Normalization) layer and a ReLU (Linear rectification function) activation function. The network structure of its four convolution modules is shown in the following table:

the characteristic graphs obtained by four convolution modules from the RGB face image I are respectively marked as

and

The characteristic maps obtained from the image F by the four convolution modules are respectively marked as->

and

Wherein both branches have the same semantic layer (e.g. </er)>

And &>

And/or>

) The two feature maps are processed by an interactive attention module, and the two features are cooperatively fused by the interactive attention module to obtain complementary feature representation.

Using pixel level segmentation supervision model to obtain feature map

And &>

Upsampling as AND>

The same magnitude, and the three upsampled feature maps are then correlated with->

Merging according to channels, finally inputting the merged data into a convolution module conv5_ x, and marking the output characteristic map as ^ er/er>

(i.e., the merged feature map) has a size of 120 × 120 × 3, and the size of the feature map is not specifically limited. Next, a respective forgery mask is generated for each input image, and pixel-level segmentation supervised training is performed. For a real face image, all pixel points are regarded as real, and the fake mask M is defined as a binary image with all pixel values being 0. For a forged face image, the forged region only appears on the face, so that pixel points in the face region (the face region is detected and located by using a multitask convolutional neural network, and then the face region is framed by a rectangular frame) are considered as false, the pixel value is set to 1, and other pixel points are real, and the pixel value is set to 0. The forgery mask can be described by the following equation:

wherein x is _ij Representing a pixel point of the input image at point (i, j). Then, the generated forgery mask M is adjusted to a size of 120 × 120 × 3, denoted by f _b It is used as a pixel level segmentation supervision signal supervision characteristic map

To guide the model to learn more accurate and discriminative sensing characteristics of counterfeit regions. Here, the pixel-level segmentation loss function is expressed as:

where, as shown in fig. 3 as an example of a forgery mask, the first column and the third column are real face images, and the second column and the fourth column are forged face images.

The correlation between local regions is considered and a generalized feature representation is learned through a local patch relation supervision model, wherein the local patch relation supervision model comprises an inter-patch consistency supervision model and an inter-patch similarity supervision model. Because the RGB face image I and the frequency domain image F are different types of data and generally share the same deception mode, the inter-patch consistency measurement supervision is carried out by utilizing an inter-patch consistency supervision model, the output consistency between the RGB face image I and the frequency domain image F is normalized, and the consistency of output characteristics is improved in a self supervision mode, wherein the RGB face image I and the frequency domain image F respectively obtain corresponding characteristic images through characteristic extraction models, namely the RGB face image I and the frequency domain image F obtain corresponding characteristic images through the characteristic extraction models, namely the RGB face image I and the frequency domain image F share the same deception mode

And &>

Will->

And &>

Spatially divided into s × s patches, denoted c _i And v _i Wherein i ∈ {1,2,.., s ∈ ² Then, c is put _i And v _i Flattened into a one-dimensional vector>

And &>

Recalculate the difference between the twoCosine similarity of (c):

wherein,

the value ranges from 0 to 1.

Higher value indicates patch c _i And v _i The more similar there between. In the ideal situation, the temperature of the air conditioner,

should be close to 1, and therefore, an all 1 matrix needs to be constructed to direct @>

And (4) learning. Finally, the inter-patch consistency loss function is expressed as:

the inter-D similarity supervision model is utilized to make the pre-training model more stable, and the feature map is subjected to

Patch & lt/EN & gt>

And &>

The cosine similarity between them is calculated as:

wherein,

a value in the range 0 to 1 for->

Using a generated fake mask M e R ^H ^'×W'×C' . First, divide M into s × s patches, and mark M _i ∈R ^{(H'/s)×(W'/s)×C'} Wherein i ∈ {1,2 ² }. Then, a probability score m 'of each patch being forged is obtained by calculating a ratio of the number of dummy pixels (pixel value = 1) to the total number of all pixels in the patch' _i ∈[0,1]. Thereafter, the patch->

And &>

The relationship supervision signal between is expressed as:

r _i,j ＝1-(m' _i -m' _j ) ²

wherein r is _i,j The value range is from 0 to 1,

used for guiding the learning of the similarity relation among patches. Finally, the inter-patch similarity loss function is expressed as:

summing the consistency loss function among patches and the similarity loss function among patches to obtain a patch-level relation supervision loss function, namely the patch-level relation supervision loss function is expressed as:

L _patch ＝L _{patch_1} +L _{patch_2}

since the forged traces are all concentrated in the face region (located at the center of the input image), the image-level classification is used for supervisionModel feature map

And &>

The divided patch at the most central position is flattened into a one-dimensional vector, and the one-dimensional vector passes through two full-connection layers (respectively provided with 300 and 2 neuron nodes) to form a classifier, so that two image-level classification predicted values are obtained and are respectively recorded as ^ er/standard ^ er>

And &>

The average of the two is then calculated as the final prediction score. The cross entropy loss function is used for two image level classification supervision as follows:

wherein y represents a real image binary classification label, if the face image is a forged face image, y =1, otherwise y =0. The image-level classification loss function is represented as a combination of two components, namely:

L _classifier ＝L _{classifier_1} +L _{classifier_2}

summing the pixel-level segmentation loss function, the patch-level relation supervision loss function and the image-level classification loss function to obtain a mixed loss function, and performing end-to-end training on a pre-training model under three types of auxiliary supervision, wherein the mixed loss function represents the mixed loss function of the overall training, namely:

L＝λ ₁ *L _classifier +λ ₂ *L _map +λ ₃ *L _patch

wherein L is _classifier Representing the image-level classification penalty function, L _map Representing a pixel-level segmentation loss function, L _patch Representing a patch-level relational loss function, λ ₁ ～λ ₃ Are weighting factors for these terms, e.g. set to λ ₁ ＝1，λ ₂ ＝λ ₃ ＝0.1。

And performing iterative training based on the mixed loss function, so that the parameters of the pre-training model are continuously adjusted until the mixed loss function is minimized, obtaining a trained pre-training model, and taking the trained pre-training model as a face image forgery detection model.

In some embodiments, the feature extraction model comprises a first branch model, a second branch model, and an interactive attention model, wherein the first branch model comprises a first spatial attention model and a first convolution model, and the second branch model comprises a second spatial attention model and a second convolution model;

step 1023, including:

and 10231, extracting important region features of the pre-training face image set through the first spatial attention model and the first convolution model to obtain a first important region feature map, and extracting important region features of a frequency domain map set of the pre-training face image set through the second spatial attention model and the second convolution model to obtain a second important region feature map.

And 10232, performing complementary fusion on the first important region feature map and the second important region feature map by using the interactive attention model to obtain a fusion region feature, and taking the fusion region feature as the feature map.

In the above scheme, the pre-trained model is made to focus on the important local regions (i.e. the first important region feature map and the second important region feature map) related to the detection of the forged face through the first spatial attention model and the second spatial attention model, wherein the schematic diagram of the spatial attention model is shown in fig. 4. Hypothesis input featuresGraph f ∈ R ^h×w×c Firstly, f respectively carries out channel dimension average pooling (avg) and maximum pooling (max) operations to obtain two characteristic diagrams with the channel number of 1 and the size of h multiplied by w multiplied by c, the two characteristic diagrams are merged according to the channels, then the channel number is changed from 2 to 1 through a3 multiplied by 3 convolutional layer, a sigmoid function is followed, and after carrying out space dimension point multiplication operation on the obtained characteristic diagram S and the original input characteristic diagram f, a3 multiplied by 3 convolutional layer is adopted.

The above process can be formulated as:

f _avg ＝F _avg (f)

f _max ＝F _max (f)

S＝σ(Conv _3×3 (Concat(f _avg ,f _max )))

an interactive attention model schematic is shown in fig. 5. Suppose f ₁ ∈R ^h×w×c And f ₂ ∈R ^h×w×c Are two feature maps from the same semantic layer in the first branch model and the second branch model. The purpose of the interactive attention model is to cooperatively fuse these two features to obtain a complementary feature representation (i.e., a feature map) consisting of spatial and channel dimensional attention. First, feature recalibration in the spatial domain creates a re-weighting matrix S by 1 × 1 convolution and Sigmoid (S-shaped growth curve) functions, and then dot-multiplies the input feature map in the spatial dimension as follows:

wherein M is ₁ ∈R ^h×w×c And M ₂ ∈R ^h×w×c Respectively represents f ₁ And f ₂ In a spatial attention map.

Representing a spatial dimension point multiplication operation. Next, M is added ₁ And M ₂ Merging according to channels to obtain a feature map V, and then fusing the two data streams by adopting a1 × 1 convolution and a3 × 3 convolution, wherein the process is represented as follows:

K＝σ(Conv _3×3 (T))

wherein T ∈ R ^h×w×2c ，K∈R ^h×w×2c And beta represents a batch normalization layer,

the ReLU function is represented. Then, splitting K into K according to channels ₁ ∈R ^h×w×1 And K ₂ ∈R ^h×w×1 Respectively correspond to f ₁ And f ₂ The attention of (1) is sought. Finally, f is mixed ₁ And f ₂ Are each independently of K ₁ And K ₂ Dot multiplication is performed to highlight the most important local areas in the image.

In some embodiments, in step 1024, merging all feature maps by using the pixel-level segmentation supervision model to obtain a merged feature map includes:

In the above scheme, for example, the feature map obtained by using the pixel-level segmentation supervision model

And &>

Upsampling as and>

(i.e., the merged feature map) has a size of 120 × 120 × 3, and the size of the feature map is not specifically limited.

In some embodiments, in step 1024, the performing pixel-level segmentation on the merged feature map to obtain a pixel-level segmentation loss function includes:

step 10241, labeling each pixel of the real face image in the merged feature map according to a preset first pixel value by using the pixel-level segmentation monitoring model to obtain a first segmentation region, and labeling each pixel of the fake face image in the merged feature map according to a preset second pixel value to obtain a second segmentation region.

Step 10242, generating a pre-training counterfeit mask based on the first divided region and the second divided region, and labeling the counterfeit face image in the combined feature map by the pre-training counterfeit mask.

Step 10243, obtaining the pixel set segmentation loss function according to the pre-training forgery mask and the merged feature map.

In the above scheme, for example, for a real face image, all the pixel points are regarded as real, and the forgery mask M is defined as a binary image with all the pixel values being 0. For a forged face image, the forged region only appears on the face, so that pixel points in the face region (which are detected and located by using a multitask convolution neural network, and then the face region is framed by a rectangular frame) are considered as false, the pixel value is set to 1, and other pixel points are true, and the pixel value is set to 0. The forgery mask can be described by the following equation:

wherein x is _ij Representing a pixel point of the input image at point (i, j). Then, the generated forgery mask M is adjusted to a size of 120 × 120 × 3, and denoted by f _b It is used as a pixel level segmentation supervision signal supervision characteristic map

step 103, in response to determining that the detection result label is a preset first numerical value, the face image to be detected is a forged face image, and a forged mask is generated in the face image to be detected to label a forged area.

In this step, for example, when the detection result label is 0, the face image to be detected is a forged face image, and a forged mask is generated in the face image to be detected to mark a forged region, where the preset first numerical value is not specifically limited.

And step 104, responding to the fact that the detection result label is determined to be a preset second numerical value, wherein the face image to be detected is a real face image.

In this step, for example, when the detection result label is 1, the face image to be detected is a real face image, and the preset second value is not specifically limited here.

According to the scheme, the face image to be detected is converted to obtain a frequency domain image of the face image to be detected, the characteristic that details are lost is changed by utilizing high-frequency information of the face forged image, more forged trace clues are mined through the frequency domain image of the face image to be detected, then the face image to be detected and the frequency domain image of the face image to be detected are respectively input into a trained face image forged detection model, and a detection result label is output.

It should be noted that the method of the embodiment of the present application may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiment, and the multiple devices interact with each other to complete the method.

It should be noted that the above describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, the application also provides a face image counterfeiting detection device corresponding to the method of any embodiment.

Referring to fig. 6, the face image forgery detection apparatus includes:

the data acquisition module 601 is configured to acquire a preprocessed face image to be detected, and perform conversion processing on the face image to be detected to obtain a frequency domain diagram of the face image to be detected;

a forgery detection module 602 configured to input the facial image to be detected and the frequency domain map of the facial image to be detected into a pre-trained facial image forgery detection model, and output a detection result label;

a first discrimination module 603, configured to respond to a determination that the detection result tag is a preset first numerical value, and generate a counterfeit mask in the to-be-detected face image to label a counterfeit region, where the to-be-detected face image is a counterfeit face image;

a second determining module 604, configured to respond to determining that the detection result label is a preset second value, where the face image to be detected is a real face image.

In some embodiments, the data acquisition module 601 is specifically configured to:

acquiring an original face image to be detected;

In some embodiments, the data obtaining module 601 is further specifically configured to:

In some embodiments, the facial image forgery detection apparatus further includes a model training module including:

the data acquisition unit is configured to acquire a preprocessed pre-training face image set and convert the pre-training face image set to obtain a frequency domain image set of the pre-training face image set;

the data input unit is configured to input the pre-training face image set and a frequency domain atlas of the pre-training face image set into pre-training models which are constructed in advance respectively, wherein the pre-training models comprise a feature extraction model, a pixel level segmentation supervision model, a local patch relation supervision model and an image level classification supervision model, and the local patch relation supervision model comprises an inter-patch consistency supervision model and an inter-patch similarity supervision model;

the feature extraction unit is configured to perform feature extraction on the pre-training face image set and the frequency domain image set of the pre-training face image set respectively through the feature extraction model to obtain feature maps;

the merging processing unit is configured to merge all the feature maps by using the pixel-level segmentation supervision model to obtain a merged feature map, and perform pixel-level segmentation on the merged feature map to obtain a pixel-level segmentation loss function;

a first loss function obtaining unit, configured to obtain an inter-patch consistency loss function through the inter-patch consistency supervision model based on the feature map, obtain an inter-patch similarity loss function through the inter-patch similarity supervision model based on the feature map, and sum the inter-patch consistency loss function and the inter-patch similarity loss function to obtain a patch-level relation supervision loss function;

a second loss function obtaining unit configured to obtain an image-level classification loss function through the image-level classification supervision model based on the feature map;

a mixed loss function obtaining unit, configured to sum the pixel-level segmentation loss function, the patch-level relation supervision loss function, and the image-level classification loss function to obtain a mixed loss function;

and the iterative training unit is configured to continuously adjust the parameters of the pre-training model based on the mixed loss function until the mixed loss function is minimized to obtain a trained pre-training model, and the trained pre-training model is used as the face image counterfeiting detection model.

a feature extraction unit, specifically configured to:

extracting important region features of the pre-training face image set through the first spatial attention model and the first convolution model to obtain a first important region feature map, and extracting important region features of the frequency domain image set of the pre-training face image set through the second spatial attention model and the second convolution model to obtain a second important region feature map;

and performing complementary fusion on the first important region feature map and the second important region feature map by using the interactive attention model to obtain a fusion region feature, and taking the fusion region feature as the feature map.

In some embodiments, the merging processing unit is specifically configured to:

and utilizing the pixel level segmentation supervision model to perform upsampling on all the feature maps to obtain feature maps with the same specification, and merging all the feature maps with the same specification to obtain a merged feature map.

In some embodiments, the merging processing unit is further specifically configured to:

marking each pixel of the real face image in the merged feature map according to a preset first pixel value by using the pixel-level segmentation supervision model to obtain a first segmentation area, and marking each pixel of the forged face image in the merged feature map according to a preset second pixel value to obtain a second segmentation area;

generating a pre-training fake mask based on the first segmentation area and the second segmentation area, and labeling fake face images in the combined feature map through the pre-training fake mask;

and obtaining the pixel set segmentation loss function according to the pre-training fake mask and the combined feature map.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.

The apparatus of the foregoing embodiment is used to implement the corresponding face image forgery detection method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, corresponding to the method of any embodiment described above, the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the method for detecting counterfeit human face image according to any embodiment described above is implemented.

Fig. 7 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 701, a memory 702, an input/output interface 703, a communication interface 704, and a bus 705. Wherein the processor 701, the memory 702, the input/output interface 703 and the communication interface 704 are communicatively connected to each other within the device via a bus 705.

The processor 701 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.

The Memory 702 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 702 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 702 and called to be executed by the processor 701.

The input/output interface 703 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output devices may include a display, speaker, vibrator, indicator light, etc.

The communication interface 704 is used for connecting a communication module (not shown in the figure) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).

Bus 705 includes a pathway for communicating information between various components of the device, such as processor 701, memory 702, input/output interface 703, and communication interface 704.

It should be noted that although the above-mentioned device only shows the processor 701, the memory 702, the input/output interface 703, the communication interface 704 and the bus 705, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only the components necessary to implement the embodiments of the present disclosure, and need not include all of the components shown in the figures.

The electronic device of the above embodiment is used to implement the corresponding face image forgery detection method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, corresponding to any of the above-mentioned embodiment methods, the present application further provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the face image falsification detection method according to any of the above-mentioned embodiments.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the method for detecting face image forgery according to any embodiment, and have the beneficial effects of the corresponding method embodiment, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that the embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.

The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present application are intended to be included within the scope of the present application.

Claims

1. A face image forgery detection method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the obtaining of the preprocessed face image to be detected includes:

acquiring an original face image to be detected;

3. The method according to claim 1, wherein the converting the face image to be detected to obtain a frequency domain map of the face image to be detected comprises:

4. The method of claim 1, wherein the face image forgery detection model is obtained by pre-training through the following processes:

obtaining an inter-patch consistency loss function through the inter-patch consistency supervision model based on the characteristic diagram, obtaining an inter-patch similarity loss function through the inter-patch similarity supervision model based on the characteristic diagram, and summing the inter-patch consistency loss function and the inter-patch similarity loss function to obtain a patch-level relation supervision loss function;

and continuously adjusting the parameters of the pre-training model based on the mixed loss function until the mixed loss function is minimized to obtain a trained pre-training model, and taking the trained pre-training model as the face image counterfeiting detection model.

5. The method of claim 4, wherein the feature extraction model comprises a first branch model, a second branch model, and an interactive attention model, wherein the first branch model comprises a first spatial attention model and a first convolution model, and wherein the second branch model comprises a second spatial attention model and a second convolution model;

the obtaining of the feature map by respectively performing feature extraction on the pre-training face image set and the frequency domain image set of the pre-training face image set through the feature extraction model comprises:

and performing complementary fusion on the first important area feature map and the second important area feature map by using the interactive attention model to obtain fusion area features, and taking the fusion area features as the feature maps.

6. The method of claim 4, wherein the merging all feature maps with the pixel-level segmentation supervision model to obtain a merged feature map comprises:

7. The method of claim 4, wherein the performing pixel-level segmentation on the merged feature map to obtain a pixel-level segmentation loss function comprises:

8. A face image forgery detection apparatus, comprising:

the face image forgery detection module is configured to input the face image to be detected and the frequency domain graph of the face image to be detected into a face image forgery detection model which is pre-trained respectively and output a detection result label;

the first judging module is configured to respond to the fact that the detection result label is a preset first numerical value, the face image to be detected is a forged face image, and a forged mask is generated in the face image to be detected to mark a forged area;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.