CN114463805A

CN114463805A - Deep forgery detection method and device, storage medium and computer equipment

Info

Publication number: CN114463805A
Application number: CN202111633779.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Real AI Technology Co Ltd
Current assignee: Beijing Real AI Technology Co Ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-05-10
Anticipated expiration: 2041-12-28
Also published as: CN114463805B

Abstract

The embodiment of the application relates to the field of image processing and provides a depth forgery detection method, a depth forgery detection device, a storage medium and computer equipment. The method comprises the following steps: acquiring a face image to be recognized, and performing feature extraction on the face image to obtain a first feature map; carrying out format conversion processing on the first feature map to obtain a second feature map, wherein the second feature map comprises a plurality of local block features; acquiring position weight information corresponding to the plurality of local block features, and performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map; and classifying the third feature map to obtain a detection result of the face image, wherein the detection result is used for indicating a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image. The method can effectively improve the accuracy of deep forgery detection.

Description

Deep forgery detection method and device, storage medium and computer equipment

Technical Field

The embodiment of the application relates to the field of image processing, in particular to a depth forgery detection method and device, a storage medium and computer equipment.

Background

Deep forgery refers to a media synthesis technology for creating or synthesizing visual and audio contents such as images, audios and videos, and texts based on an intelligent method such as deep learning. It is colloquially understood that "depth forgery" is the placement of one person's facial contours and expressions on the face of any other person, thereby creating a video or image that is truly composite but appears to be extremely genuine. On the one hand, deep counterfeiting technology can promote the development of entertainment and cultural communication industries, such as application in movie production for creating virtual characters, video rendering, sound simulation, "reviving" historical characters or elapsed relatives and friends. On the other hand, deep counterfeiting technology can also be used for misleading public opinion, disturbing social order, even threatening the safety of a face recognition system, and the like.

In the prior art, one of the more common depth forgery detection methods is an image classification method. The method includes firstly collecting a large amount of real and fake data to train a two-class deep neural network, then classifying pictures to be detected (including independent images or frame images of a video clock) by using the trained deep neural network, and finally fusing recognition results of the images into recognition results of the video to be output. However, in the image or video of the depth forgery, the forgery trace is mainly concentrated in some specific areas, such as the whole face, the face contour, the vicinity of the mouth, and the like, and the common classification model cannot enhance the learning of the forgery area, so the existing image classification method has poor accuracy for the depth forgery detection.

Disclosure of Invention

In view of this, the present application provides a depth forgery detection method, apparatus, storage medium, and computer device, and mainly aims to solve the technical problem of poor accuracy of depth forgery detection.

In a first aspect, an embodiment of the present application provides a depth forgery detection method, including:

acquiring a face image to be recognized, and extracting the features of the face image to obtain a first feature map;

carrying out format conversion processing on the first feature map to obtain a second feature map, wherein the second feature map comprises a plurality of local block features;

acquiring position weight information corresponding to the plurality of local block features, and performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map;

and classifying the third feature map to obtain a detection result of the face image, wherein the detection result is used for indicating a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image.

In a second aspect, the present application provides a depth forgery detection apparatus implementing the depth forgery detection method described above, the apparatus including:

the input and output module is used for acquiring a face image to be recognized;

the processing module is used for extracting the features of the face image to obtain a first feature map and performing format conversion processing on the first feature map to obtain a second feature map, wherein the second feature map comprises a plurality of local block features;

the input and output module is also used for acquiring position weight information corresponding to the plurality of local block characteristics;

the processing module is further used for performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map, and performing classification processing on the third feature map to obtain a detection result of the face image, wherein the detection result is used for indicating a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image;

and the input and output module is also used for outputting the detection result of the face image.

In an embodiment, the processing module is specifically configured to perform feature extraction on the face image through a pre-trained convolutional neural network to obtain a first feature map, where the first feature map includes features of three dimensions, and the three dimensions of the first feature map are height, width, and channel number of the first feature map, respectively.

In an embodiment, the processing module is specifically configured to convert the first feature map from a three-dimensional feature matrix to a two-dimensional feature matrix to obtain a second feature map, where the second feature map includes features of two dimensions, and the two dimensions of the second feature map are the local block feature quantity and the feature length of the second feature map, respectively.

In an embodiment, the processing module is specifically configured to set a position coding feature for each local block feature according to position weight information of a plurality of local block features, where the position coding feature is a feature vector with a preset length; respectively splicing each local block feature and the position coding feature corresponding to each local block feature to obtain a plurality of cascade features of a second feature map; setting a flag bit feature at the starting position of the cascade features, wherein the length of the flag bit feature is equal to that of the cascade features; and performing attention enhancement processing on the zone bit characteristics and the cascade characteristics of the second characteristic diagram to obtain a third characteristic diagram.

In one embodiment, the processing module is specifically configured to input the flag bit features and the cascade features of the second feature map into a pre-trained fully-connected layer to obtain a query feature vector, an attribute feature vector, and a content feature vector; normalizing the product of the query feature vector and the transposed attribute feature vector to obtain an attention diagram; and multiplying the attention diagram by the content feature vector to obtain a third feature diagram.

In one embodiment, the processing module is specifically configured to input the third feature map into a pre-trained multi-layer perceptron, so as to obtain a depth forgery probability value of the face image; and comparing the depth forgery probability value of the face image with a preset probability threshold value, and obtaining the detection result of the face image according to the comparison result.

In one embodiment, the input-output module is further configured to obtain initial image data or obtain initial video data; the processing module is also used for identifying a face area in the image data through a face identification algorithm and intercepting the face area to obtain a face image to be identified; or identifying the face area in each frame of image in the video data through a face identification algorithm, and intercepting the face area to obtain the face image to be identified.

In a third aspect, embodiments of the present application provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the above-mentioned depth forgery detection method.

In a fourth aspect, the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the above depth forgery detection method when executing the program.

According to the depth forgery detection method, the device, the storage medium and the computer equipment, firstly, feature extraction is carried out on a face image, then the extracted features are converted into a second feature map containing a plurality of local block features, further position weight information corresponding to the local block features is obtained, attention enhancement processing is carried out on the local block features of the second feature map according to the position weight information, a third feature map is obtained, finally, classification processing is carried out on the third feature map, a detection result of the face image is obtained, and thus the forgery features in the face image are indicated by utilizing target local block features in the detection result. According to the method, the high-level features in the face image are converted into local block features convenient to identify, the global correlation features of the local block features are extracted through attention enhancement processing, the attention degree of a forged region in the face region is effectively enhanced, and the accuracy of deep forging detection is improved.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the embodiments of the application and not to limit the embodiments of the application unduly. In the drawings:

fig. 1 is a scene schematic diagram illustrating a depth forgery detection method provided by an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating a depth forgery detection method according to an embodiment of the present application;

fig. 3 is a detection schematic diagram illustrating a depth forgery detection method according to an embodiment of the present application;

fig. 4 is a detection schematic diagram illustrating a depth forgery detection method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram illustrating a depth forgery detection apparatus according to an embodiment of the present application;

fig. 6 shows an internal structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings in conjunction with the embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

To further explain the technical means and effects of the embodiments of the present application to achieve the intended purpose, the following detailed description of the embodiments, structures, features and effects according to the present application will be given with reference to the accompanying drawings and embodiments. In the following description, different "one embodiment" or "an embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Further, although the steps in the respective embodiments are arranged in order by the sequence numbers, the steps are not necessarily performed in order by the sequence numbers. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in each embodiment may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.

The depth forgery detection method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. As shown in fig. 1, the computer device 110 may communicate with the data acquisition device 120 through a network, the data acquisition device 120 may acquire image data or video data including a face image and transmit the acquired image data or video data to the computer device 110, and the computer device 110 may perform a series of processes on the image data or video data to obtain a depth forgery detection result of the face image in the image data or video data. In the scene, a depth forgery situation that a dynamic image replaces a real person to perform face recognition may occur, and for the depth forgery situation, the method for performing depth forgery detection on the face image in the image data or the video data can achieve the purpose of recognizing the authenticity of the face image. The computer device may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, servers, or a server cluster composed of a plurality of servers, and the like. The data acquisition equipment can be terminal equipment with camera shooting or photographing functions, and the shape, arrangement mode and size of the terminal equipment are not limited.

In one embodiment, as shown in fig. 2, a depth forgery detection method is provided, which is illustrated by applying the method to the computer device 110 in fig. 1, and includes the following steps:

201. the method comprises the steps of obtaining a face image to be recognized, and extracting features of the face image to obtain a first feature map.

The face image refers to an image containing a face region. Specifically, the computer device may acquire a face image to be recognized through various approaches such as a data acquisition device, a face image database, or a network, and then extract local features in the face image through a preset image feature extraction method to obtain a first feature image, where the first feature image refers to a multi-dimensional feature matrix composed of the local features.

202. And carrying out format conversion processing on the first characteristic diagram to obtain a second characteristic diagram.

The second feature map includes a plurality of local block features, which are also referred to as local image features, and refer to local expressions of image features, which reflect local characteristics of the image, and may be suitable for applications such as matching and searching the image. Specifically, the first feature map may be converted into a second feature map through a predetermined format conversion operation, where the second feature map refers to a two-dimensional feature matrix composed of local block features. In this embodiment, the first characteristic diagram may be converted into the second characteristic diagram by matrix transformation.

203. And acquiring position weight information corresponding to the plurality of local block features, and performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map.

Specifically, in the image data, a certain positional relationship is usually provided between the local blocks, and the positional relationship is used to train the positional weight information corresponding to the characteristics of each local block in advance. Furthermore, by fusing the local block features and the position weight information corresponding to the local block features and performing attention enhancement processing on the fused features, the global correlation features between the local blocks can be extracted, and a third feature map can be obtained.

204. And classifying the third feature map to obtain a detection result of the face image.

Specifically, the detection result of the face image can be obtained by classifying the third feature map, wherein the image classification can be realized by using some pre-trained models or algorithms. In this embodiment, since the third feature includes the global correlation feature between the local block and the local block, the target local block feature in each local block feature can be identified by classifying the global correlation feature in the third feature map, and further, the counterfeit feature in the face image can be found by performing inverse estimation on the target local block feature, so as to achieve the purpose of performing deep counterfeit detection on the face image.

The depth forgery detection method provided in this embodiment includes first performing feature extraction on a face image, then converting the extracted features into a second feature map including a plurality of local block features, further obtaining position weight information corresponding to the plurality of local block features, performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information to obtain a third feature map, and finally performing classification processing on the third feature map to obtain a detection result of the face image. According to the method, the high-level features in the face image are converted into local block features convenient to identify, the global correlation features of the local block features are extracted through attention enhancement processing, the attention degree of a forged region in the face region is effectively enhanced, and the accuracy of deep forging detection is improved.

In one embodiment, step 201 may be implemented by: feature extraction is carried out on the face image through a pre-trained Convolutional Neural Network (CNN) to obtain a first feature map. In this embodiment, the first feature map includes features of three dimensions, where the three dimensions are the height, width, and number of channels of the first feature map. According to the embodiment, the high-level features of the face image are extracted through the convolutional neural network, so that the extraction of the local features in the face image can be enhanced, and the accuracy of deep forgery detection of the face image is improved.

In one embodiment, step 202 may be implemented by: and converting the first characteristic diagram from the three-dimensional characteristic matrix into a two-dimensional characteristic matrix to obtain a second characteristic diagram. In this embodiment, the second feature map includes features of two dimensions in total, where the two dimensions are the local block feature number and the feature length of the second feature map, respectively, where the local block feature number of the second feature map is a product of the length and the width of the first feature map, and the feature length of the second feature map is the number of channels of the first feature map. In this embodiment, the feature of the first dimension and the feature of the second dimension of the first feature map are combined into the feature of the second feature map in the same dimension, so that the local feature of the face image can be converted into the local block feature, and the local block feature is abstracted into the time sequence feature to be processed, thereby facilitating the attention enhancement processing of the converted feature in the subsequent steps, so as to identify the target local block feature in each local block feature, and correspond the target local block feature to the fake feature in the face image.

In one embodiment, step 203 may be implemented by: firstly, according to position weight information of a plurality of local block features, setting a position coding feature for each local block feature, wherein the position coding feature is a feature vector with a preset length, then respectively splicing each local block feature and the position coding feature corresponding to each local block feature to obtain a plurality of cascade features of a second feature map, further setting a flag bit feature at the initial position of the cascade features, wherein the length of the flag bit feature is equal to the length of the cascade features, and finally performing attention enhancement processing on the flag bit feature of the second feature map and each cascade feature to obtain a third feature map. In the embodiment, a position coding feature is set for each local block feature, the local block features and the position coding features are spliced into a cascade feature, a flag bit feature is set at the starting position of a plurality of cascade features, and finally attention is given to the cascade features and the flag bit feature, so that the global correlation feature of each local block feature can be extracted, the purpose of enhancing a forged region in a face image is achieved, and the accuracy of deep forging detection is effectively improved.

In one embodiment, the step of performing attention enhancement processing on the local block features in step 203 may be implemented by: firstly, inputting the zone bit characteristics and all cascade characteristics of the second characteristic diagram into a pre-trained full-connection layer to obtain a query characteristic vector, an attribute characteristic vector and a content characteristic vector, then carrying out normalization processing on the product of the query characteristic vector and the transposed attribute characteristic vector to obtain an attention diagram, and finally multiplying the attention diagram and the content characteristic vector to obtain a third characteristic diagram. Specifically, the calculation formula for performing attention enhancement processing on the local block features is as follows:

where Q is the query feature vector, K is the attribute feature vector, and V is the content feature vector. In this embodiment, assuming that the second feature map after the position weight information fusion is a feature vector of (n +1) × (c + c '), softmax (QKT/sqrt (dk)) is a feature vector of (n +1) × (n +1), which means the correlation between (n +1) local block features, and a vector of (n +1) × (c + c') can be obtained by performing matrix multiplication on the attention map and V for subsequent classification processing. Specifically, according to the matrix multiplication calculation principle, the value of the (i, j) position can be obtained by performing weighted summation on the ith row of the attention map and the jth column of V, which is equivalent to that n +1 time sequences of the jth column of V are weighted by different weights, that is, the degrees of the weightings of different local blocks are different, and the weight of a forged region in the face image is relatively large, so that the method can enhance the attention to the forged region. In this embodiment, the flag bit features and the cascade features of the second feature map are converted into query feature vectors, attribute feature vectors, and content feature vectors, and the product of the query feature vectors and the transposed attribute feature vectors is normalized, so that global connection feature maps among local blocks can be obtained.

In one embodiment, step 204 may be implemented by: firstly, inputting a third feature map into a pre-trained Multilayer Perceptron (MLP) to obtain a depth forgery probability value of a face image, then comparing the depth forgery probability value of the face image with a preset probability threshold value, and finally obtaining a detection result of the face image according to the comparison result, namely, regarding the face image with the depth forgery probability value larger than the preset probability threshold value, the face image is considered to be formed by a depth forgery technology, so that the purpose of depth forgery detection is achieved. In the embodiment, the third feature map with the global relevance feature is input into the multilayer sensor, so that a more accurate detection result can be obtained, and the accuracy of counterfeit detection is further improved.

In one embodiment, prior to step 201, the depth forgery detection may further include the steps of: the method comprises the steps of firstly obtaining initial image data or initial video data, then identifying a face area in the image data through a face recognition algorithm, intercepting the face area to obtain a face image to be recognized, or identifying the face area in each frame of image in the video data through the face recognition algorithm, and intercepting the face area to obtain the face image to be recognized. In the embodiment, the face region is extracted from the original image or the original video through the face recognition algorithm to perform local image detection, so that the accuracy of depth forgery detection can be improved, the calculated amount of image processing is reduced, and the efficiency of depth forgery detection on the face image is improved.

Further, as a refinement and an expansion of the specific implementation of each of the above embodiments, in order to fully describe the implementation process of this embodiment, the following provides a further description of the deep forgery detection method provided in this application through specific examples.

According to the depth forgery detection method provided by the embodiment, the attention mechanism model transformer in the natural language processing field is applied to detection of depth forgery pictures or videos, and the local features of the face images are extracted by combining the convolutional neural network CNN, so that the accuracy of depth forgery detection can be effectively improved. Specifically, the method comprises the steps of firstly, locally extracting features of a face image by using CNN, then, applying transform on a feature level, abstracting each spatial feature into a time sequence, learning attention force drawing, and learning a forged area with emphasis on the forged area to finally obtain a forged detection result of the depth of the face image.

In this embodiment, the method mainly includes two modules, the first module extracts the high-level features of the face image through CNN, and the second module performs global correlation modeling on the high-level features by using a transformer. The specific implementation mode comprises the following steps:

step 1, training set data preparation. A large number of real face pictures and fake face pictures are collected as training sets, the real face data are widely available and can be from various open source face data sets and can also be collected on the Internet, and fake face pictures can be from the open source face data sets and the Internet and are generated by self-counterfeiting by utilizing an algorithm.

And 2, constructing a model. Two core modules of the model are a CNN local feature extraction module and a transform overall correlation feature extraction module respectively. The algorithm flow chart is shown in fig. 3.

Firstly, an input face image is subjected to extraction of local high-level features through CNN, generally speaking, the input face image is in a data format of H × W × 3, wherein H and W respectively represent the height and width of the input face image, three color channels of the image represented by 3 are represented, and the input image is marked as X. After the features are extracted from X by CNN, the data format of the obtained feature map is h × w × c, h and w respectively represent the height and width of the feature map, and c represents the number of channels of the feature map, and the feature is denoted as F1 (i.e., the first feature map). In order to further process the feature F1 through the transform, a format conversion operation may be performed on the feature, where the converted feature is denoted as F2 (i.e., a second feature map), and F2 ═ reshape (F1), where the feature format of F2 is n × c and n ═ h × w, where n corresponds to the time-series length in the natural language processing domain, and n corresponds to the number of local blocks in the visual domain. Subsequently, the transform can be used to further process the feature F2 to obtain the classification probability, and the method is to use the transform to perform global correlation feature extraction on all local blocks in the feature F2.

Further, as shown in fig. 4, for a feature F2 with an input data format of n × c, a position code is first added to each local block, the position code of each block is randomly initialized to a vector of c ', the position code is optimized by gradient back-pass in the subsequent training process, each block position code vector and the feature are concatenated together to form a vector with a length of (c + c'), and the position code is used to construct a relative position relationship between the local block and the local block. Meanwhile, in order to realize the function of subsequent classification, a position code and a feature of a position 0 are added, and the position code and the feature are randomly initialized into a vector with the length of (c + c'). After the above processing, a feature vector of (n +1) × (c + c') is obtained, and a calculation formula of the processed feature F2 is as follows:

further, the processed feature F2 may obtain 3 features, Q, K and V, through a full connection layer, where Q is a query feature vector, K is an attribute feature vector, V is a content feature vector, the feature sizes of Q, K, V are all (n +1) × (c + c'), and QK is^THas a size of (n +1) × (n +1), d_kTo normalize the factors, the effect is to avoid excessive matrix product, typically d_kThe global relation feature map is obtained by a softmax function, then the global relation feature map is multiplied by V, and the vector of (n +1) × (c + c') is obtained, so that the feature V has a reinforcing effect on a specific block according to the global relation feature map. In this embodiment, the weight of the self-learned result of the tranformer model to the forged area is relatively large, so that the attention to the forged area can be enhanced.

In the transform module, there are a plurality of the above-mentioned computation cascades, which constitute the transform encoder (i.e., the encoder of the attention model) shown in fig. 4. the output of the transform encoder is still a vector of (n +1) × (c + c '), which is marked as a feature F3 (i.e. a third feature map), the vector of the 0 th block of the feature F3 is taken, the length of the vector is (c + c'), classification probability values can be obtained after the vector passes through a multilayer perceptron, and the classification probability values are compared with a preset threshold value, so that the depth forgery detection result of the face image can be obtained.

According to the depth forgery detection method provided by the embodiment, the characteristics in the face image are extracted by using the CNN, the learning and extraction of the local characteristics in the face image can be effectively enhanced, the global correlation characteristics in the face image are learned by using a transformer, the attention degree of a forged area in the face image can be further enhanced, and the accuracy of depth forgery detection on the face image is improved.

Further, as a specific implementation of the method shown in fig. 1 to 4, the present embodiment provides a depth forgery detection apparatus, as shown in fig. 5, including: an input-output module 31 and a processing module 32.

The input/output module 31 may be configured to obtain a face image to be recognized;

the processing module 32 is configured to perform feature extraction on the face image to obtain a first feature map, and perform format conversion processing on the first feature map to obtain a second feature map, where the second feature map includes a plurality of local block features;

the input/output module 31 may also be configured to obtain position weight information corresponding to a plurality of local block features;

the processing module 32 is further configured to perform attention enhancement processing on the multiple local block features of the second feature map according to the position weight information corresponding to the multiple local block features to obtain a third feature map, and perform classification processing on the third feature map to obtain a detection result of the face image, where the detection result is used to indicate a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image;

the input/output module 31 may also be configured to output a detection result of the face image.

In a specific application scenario, the processing module 32 may be specifically configured to perform feature extraction on a face image through a pre-trained convolutional neural network to obtain a first feature map, where the first feature map includes features of three dimensions, and the three dimensions of the first feature map are height, width, and channel number of the first feature map respectively.

In a specific application scenario, the processing module 32 may be specifically configured to convert the first feature map from a three-dimensional feature matrix to a two-dimensional feature matrix to obtain a second feature map, where the second feature map includes features of two dimensions, and the two dimensions of the second feature map are the local block feature quantity and the feature length of the second feature map respectively.

In a specific application scenario, the processing module 32 is specifically configured to set a position coding feature for each local block feature according to position weight information of a plurality of local block features, where the position coding feature is a feature vector with a preset length; respectively splicing each local block feature and the position coding feature corresponding to each local block feature to obtain a plurality of cascade features of a second feature map; setting a flag bit feature at the starting position of the cascade features, wherein the length of the flag bit feature is equal to that of the cascade features; and performing attention enhancement processing on the zone bit characteristics and the cascade characteristics of the second characteristic diagram to obtain a third characteristic diagram.

In a specific application scenario, the processing module 32 may be specifically configured to input the flag bit features and the cascade features of the second feature map into a pre-trained full-link layer to obtain a query feature vector, an attribute feature vector, and a content feature vector; normalizing the product of the query feature vector and the transposed attribute feature vector to obtain an attention diagram; and multiplying the attention diagram by the content feature vector to obtain a third feature diagram.

In a specific application scenario, the processing module 32 may be specifically configured to input the third feature map into a pre-trained multi-layer perceptron, so as to obtain a depth forgery probability value of the face image; and comparing the depth forgery probability value of the face image with a preset probability threshold value, and obtaining the detection result of the face image according to the comparison result.

In a specific application scenario, the input/output module 31 may be further configured to obtain initial image data or initial video data; the processing module 32 may also be configured to identify a face region in the image data through a face recognition algorithm, and perform an intercepting operation on the face region to obtain a face image to be identified; or identifying the face region in each frame of image in the video data through a face identification algorithm, and intercepting the face region to obtain the face image to be identified.

It should be noted that other corresponding descriptions of the functional units related to the deep forgery detection apparatus provided by this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 4, and are not repeated herein.

Based on the methods shown in fig. 1 to 4, correspondingly, the present embodiment further provides a storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the depth forgery detection method shown in fig. 1 to 4.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, and the software product to be identified may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and include several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the implementation scenarios of the present application.

Based on the method shown in fig. 1 to 4 and the embodiment of the depth forgery detection apparatus shown in fig. 5, in order to achieve the above object, as shown in fig. 6, the present embodiment further provides a computer device for depth forgery detection, which may be specifically a personal computer, a server, a smart phone, a tablet computer, a smart watch, or other network devices, and the computer device includes a storage medium and a processor; a storage medium for storing a computer program and an operating system; a processor for executing a computer program for implementing the above-described method as shown in fig. 1 to 4.

Optionally, the computer device may further include an internal memory, a communication interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, a Display (Display), an input device such as a Keyboard (Keyboard), and the like, and optionally, the communication interface may further include a USB interface, a card reader interface, and the like. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.

It will be understood by those skilled in the art that the present embodiment provides a computer device structure for deep forgery detection, which does not constitute a limitation of the computer device, and may include more or less components, or combine some components, or arrange different components.

The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware of the above-described computer device and the software resources to be identified, and supports the execution of the information processing program and other software and/or programs to be identified. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing computer equipment.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. By applying the technical scheme of the application, firstly, feature extraction is carried out on the face image, then the extracted features are converted into a second feature map containing a plurality of local block features, further position weight information corresponding to the plurality of local block features is obtained, attention enhancement processing is carried out on the plurality of local block features of the second feature map according to the position weight information, a third feature map is obtained, finally, classification processing is carried out on the third feature map, a detection result of the face image is obtained, and forged features in the face image are indicated by using target local block features in the detection result. Compared with the prior art, the method can effectively enhance the attention degree of the forged area in the face area, thereby improving the accuracy of deep forging detection.

Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims

1. A method of depth forgery detection, the method comprising:

acquiring position weight information corresponding to the local block features, and performing attention enhancement processing on the local block features of the second feature map according to the position weight information corresponding to the local block features to obtain a third feature map;

2. The method according to claim 1, wherein the extracting the features of the face image to obtain a first feature map comprises:

and performing feature extraction on the face image through a pre-trained convolutional neural network to obtain a first feature map, wherein the first feature map comprises features of three dimensions, and the three dimensions of the first feature map are respectively the height, the width and the channel number of the first feature map.

3. The method according to claim 2, wherein the performing format conversion processing on the first feature map to obtain a second feature map comprises:

and converting the first feature map from a three-dimensional feature matrix to a two-dimensional feature matrix to obtain a second feature map, wherein the second feature map comprises features of two dimensions, and the two dimensions of the second feature map are the local block feature quantity and the feature length of the second feature map respectively.

4. The method according to any one of claims 1 to 3, wherein the performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map comprises:

setting a position coding feature for each local block feature according to the position weight information of the local block features, wherein the position coding feature is a feature vector with a preset length;

respectively splicing each local block feature and the position coding feature corresponding to each local block feature to obtain a plurality of cascade features of the second feature map;

setting a flag bit feature at the starting position of the cascade features, wherein the length of the flag bit feature is equal to that of the cascade features;

and performing attention enhancement processing on the zone bit characteristics and the cascade characteristics of the second characteristic diagram to obtain a third characteristic diagram.

5. The method according to claim 4, wherein the performing attention-enhancing processing on the flag bit features and the respective cascade features of the second feature map to obtain a third feature map comprises:

inputting the zone bit characteristics and the cascade characteristics of the second characteristic diagram into a pre-trained full-connection layer to obtain a query characteristic vector, an attribute characteristic vector and a content characteristic vector;

normalizing the product of the query feature vector and the converted attribute feature vector to obtain an attention diagram;

and multiplying the attention diagram and the content feature vector to obtain the third feature diagram.

6. The method according to claim 1, wherein the classifying the third feature map to obtain the detection result of the face image comprises:

inputting the third feature map into a pre-trained multilayer perceptron to obtain a depth forgery probability value of the face image;

and comparing the depth forgery probability value of the face image with a preset probability threshold value, and obtaining the detection result of the face image according to the comparison result.

7. The method according to claim 1, wherein the acquiring the face image to be recognized comprises:

acquiring initial image data, identifying a face area in the image data through a face identification algorithm, and intercepting the face area to obtain a face image to be identified; or

Acquiring initial video data, identifying a face area in each frame of image in the video data through a face identification algorithm, and intercepting the face area to obtain the face image to be identified.

8. A depth forgery detection apparatus, characterized in that said apparatus comprises:

the input and output module is used for acquiring position weight information corresponding to the local block characteristics;

the processing module is further configured to perform attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map, and perform classification processing on the third feature map to obtain a detection result of the face image, where the detection result is used to indicate a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image;

the input and output module is further used for outputting the detection result of the face image.

9. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by the processor.