CN114463805A - Deep forgery detection method and device, storage medium and computer equipment - Google Patents

Deep forgery detection method and device, storage medium and computer equipment Download PDF

Info

Publication number
CN114463805A
CN114463805A CN202111633779.7A CN202111633779A CN114463805A CN 114463805 A CN114463805 A CN 114463805A CN 202111633779 A CN202111633779 A CN 202111633779A CN 114463805 A CN114463805 A CN 114463805A
Authority
CN
China
Prior art keywords
feature
feature map
local block
features
face image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111633779.7A
Other languages
Chinese (zh)
Other versions
CN114463805B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Real AI Technology Co Ltd
Original Assignee
Beijing Real AI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Real AI Technology Co Ltd filed Critical Beijing Real AI Technology Co Ltd
Priority to CN202111633779.7A priority Critical patent/CN114463805B/en
Publication of CN114463805A publication Critical patent/CN114463805A/en
Application granted granted Critical
Publication of CN114463805B publication Critical patent/CN114463805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the application relates to the field of image processing and provides a depth forgery detection method, a depth forgery detection device, a storage medium and computer equipment. The method comprises the following steps: acquiring a face image to be recognized, and performing feature extraction on the face image to obtain a first feature map; carrying out format conversion processing on the first feature map to obtain a second feature map, wherein the second feature map comprises a plurality of local block features; acquiring position weight information corresponding to the plurality of local block features, and performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map; and classifying the third feature map to obtain a detection result of the face image, wherein the detection result is used for indicating a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image. The method can effectively improve the accuracy of deep forgery detection.

Description

Deep forgery detection method and device, storage medium and computer equipment
Technical Field
The embodiment of the application relates to the field of image processing, in particular to a depth forgery detection method and device, a storage medium and computer equipment.
Background
Deep forgery refers to a media synthesis technology for creating or synthesizing visual and audio contents such as images, audios and videos, and texts based on an intelligent method such as deep learning. It is colloquially understood that "depth forgery" is the placement of one person's facial contours and expressions on the face of any other person, thereby creating a video or image that is truly composite but appears to be extremely genuine. On the one hand, deep counterfeiting technology can promote the development of entertainment and cultural communication industries, such as application in movie production for creating virtual characters, video rendering, sound simulation, "reviving" historical characters or elapsed relatives and friends. On the other hand, deep counterfeiting technology can also be used for misleading public opinion, disturbing social order, even threatening the safety of a face recognition system, and the like.
In the prior art, one of the more common depth forgery detection methods is an image classification method. The method includes firstly collecting a large amount of real and fake data to train a two-class deep neural network, then classifying pictures to be detected (including independent images or frame images of a video clock) by using the trained deep neural network, and finally fusing recognition results of the images into recognition results of the video to be output. However, in the image or video of the depth forgery, the forgery trace is mainly concentrated in some specific areas, such as the whole face, the face contour, the vicinity of the mouth, and the like, and the common classification model cannot enhance the learning of the forgery area, so the existing image classification method has poor accuracy for the depth forgery detection.
Disclosure of Invention
In view of this, the present application provides a depth forgery detection method, apparatus, storage medium, and computer device, and mainly aims to solve the technical problem of poor accuracy of depth forgery detection.
In a first aspect, an embodiment of the present application provides a depth forgery detection method, including:
acquiring a face image to be recognized, and extracting the features of the face image to obtain a first feature map;
carrying out format conversion processing on the first feature map to obtain a second feature map, wherein the second feature map comprises a plurality of local block features;
acquiring position weight information corresponding to the plurality of local block features, and performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map;
and classifying the third feature map to obtain a detection result of the face image, wherein the detection result is used for indicating a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image.
In a second aspect, the present application provides a depth forgery detection apparatus implementing the depth forgery detection method described above, the apparatus including:
the input and output module is used for acquiring a face image to be recognized;
the processing module is used for extracting the features of the face image to obtain a first feature map and performing format conversion processing on the first feature map to obtain a second feature map, wherein the second feature map comprises a plurality of local block features;
the input and output module is also used for acquiring position weight information corresponding to the plurality of local block characteristics;
the processing module is further used for performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map, and performing classification processing on the third feature map to obtain a detection result of the face image, wherein the detection result is used for indicating a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image;
and the input and output module is also used for outputting the detection result of the face image.
In an embodiment, the processing module is specifically configured to perform feature extraction on the face image through a pre-trained convolutional neural network to obtain a first feature map, where the first feature map includes features of three dimensions, and the three dimensions of the first feature map are height, width, and channel number of the first feature map, respectively.
In an embodiment, the processing module is specifically configured to convert the first feature map from a three-dimensional feature matrix to a two-dimensional feature matrix to obtain a second feature map, where the second feature map includes features of two dimensions, and the two dimensions of the second feature map are the local block feature quantity and the feature length of the second feature map, respectively.
In an embodiment, the processing module is specifically configured to set a position coding feature for each local block feature according to position weight information of a plurality of local block features, where the position coding feature is a feature vector with a preset length; respectively splicing each local block feature and the position coding feature corresponding to each local block feature to obtain a plurality of cascade features of a second feature map; setting a flag bit feature at the starting position of the cascade features, wherein the length of the flag bit feature is equal to that of the cascade features; and performing attention enhancement processing on the zone bit characteristics and the cascade characteristics of the second characteristic diagram to obtain a third characteristic diagram.
In one embodiment, the processing module is specifically configured to input the flag bit features and the cascade features of the second feature map into a pre-trained fully-connected layer to obtain a query feature vector, an attribute feature vector, and a content feature vector; normalizing the product of the query feature vector and the transposed attribute feature vector to obtain an attention diagram; and multiplying the attention diagram by the content feature vector to obtain a third feature diagram.
In one embodiment, the processing module is specifically configured to input the third feature map into a pre-trained multi-layer perceptron, so as to obtain a depth forgery probability value of the face image; and comparing the depth forgery probability value of the face image with a preset probability threshold value, and obtaining the detection result of the face image according to the comparison result.
In one embodiment, the input-output module is further configured to obtain initial image data or obtain initial video data; the processing module is also used for identifying a face area in the image data through a face identification algorithm and intercepting the face area to obtain a face image to be identified; or identifying the face area in each frame of image in the video data through a face identification algorithm, and intercepting the face area to obtain the face image to be identified.
In a third aspect, embodiments of the present application provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the above-mentioned depth forgery detection method.
In a fourth aspect, the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the above depth forgery detection method when executing the program.
According to the depth forgery detection method, the device, the storage medium and the computer equipment, firstly, feature extraction is carried out on a face image, then the extracted features are converted into a second feature map containing a plurality of local block features, further position weight information corresponding to the local block features is obtained, attention enhancement processing is carried out on the local block features of the second feature map according to the position weight information, a third feature map is obtained, finally, classification processing is carried out on the third feature map, a detection result of the face image is obtained, and thus the forgery features in the face image are indicated by utilizing target local block features in the detection result. According to the method, the high-level features in the face image are converted into local block features convenient to identify, the global correlation features of the local block features are extracted through attention enhancement processing, the attention degree of a forged region in the face region is effectively enhanced, and the accuracy of deep forging detection is improved.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the embodiments of the application and not to limit the embodiments of the application unduly. In the drawings:
fig. 1 is a scene schematic diagram illustrating a depth forgery detection method provided by an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating a depth forgery detection method according to an embodiment of the present application;
fig. 3 is a detection schematic diagram illustrating a depth forgery detection method according to an embodiment of the present application;
fig. 4 is a detection schematic diagram illustrating a depth forgery detection method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram illustrating a depth forgery detection apparatus according to an embodiment of the present application;
fig. 6 shows an internal structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings in conjunction with the embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
To further explain the technical means and effects of the embodiments of the present application to achieve the intended purpose, the following detailed description of the embodiments, structures, features and effects according to the present application will be given with reference to the accompanying drawings and embodiments. In the following description, different "one embodiment" or "an embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Further, although the steps in the respective embodiments are arranged in order by the sequence numbers, the steps are not necessarily performed in order by the sequence numbers. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in each embodiment may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.
The depth forgery detection method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. As shown in fig. 1, the computer device 110 may communicate with the data acquisition device 120 through a network, the data acquisition device 120 may acquire image data or video data including a face image and transmit the acquired image data or video data to the computer device 110, and the computer device 110 may perform a series of processes on the image data or video data to obtain a depth forgery detection result of the face image in the image data or video data. In the scene, a depth forgery situation that a dynamic image replaces a real person to perform face recognition may occur, and for the depth forgery situation, the method for performing depth forgery detection on the face image in the image data or the video data can achieve the purpose of recognizing the authenticity of the face image. The computer device may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, servers, or a server cluster composed of a plurality of servers, and the like. The data acquisition equipment can be terminal equipment with camera shooting or photographing functions, and the shape, arrangement mode and size of the terminal equipment are not limited.
In one embodiment, as shown in fig. 2, a depth forgery detection method is provided, which is illustrated by applying the method to the computer device 110 in fig. 1, and includes the following steps:
201. the method comprises the steps of obtaining a face image to be recognized, and extracting features of the face image to obtain a first feature map.
The face image refers to an image containing a face region. Specifically, the computer device may acquire a face image to be recognized through various approaches such as a data acquisition device, a face image database, or a network, and then extract local features in the face image through a preset image feature extraction method to obtain a first feature image, where the first feature image refers to a multi-dimensional feature matrix composed of the local features.
202. And carrying out format conversion processing on the first characteristic diagram to obtain a second characteristic diagram.
The second feature map includes a plurality of local block features, which are also referred to as local image features, and refer to local expressions of image features, which reflect local characteristics of the image, and may be suitable for applications such as matching and searching the image. Specifically, the first feature map may be converted into a second feature map through a predetermined format conversion operation, where the second feature map refers to a two-dimensional feature matrix composed of local block features. In this embodiment, the first characteristic diagram may be converted into the second characteristic diagram by matrix transformation.
203. And acquiring position weight information corresponding to the plurality of local block features, and performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map.
Specifically, in the image data, a certain positional relationship is usually provided between the local blocks, and the positional relationship is used to train the positional weight information corresponding to the characteristics of each local block in advance. Furthermore, by fusing the local block features and the position weight information corresponding to the local block features and performing attention enhancement processing on the fused features, the global correlation features between the local blocks can be extracted, and a third feature map can be obtained.
204. And classifying the third feature map to obtain a detection result of the face image.
Specifically, the detection result of the face image can be obtained by classifying the third feature map, wherein the image classification can be realized by using some pre-trained models or algorithms. In this embodiment, since the third feature includes the global correlation feature between the local block and the local block, the target local block feature in each local block feature can be identified by classifying the global correlation feature in the third feature map, and further, the counterfeit feature in the face image can be found by performing inverse estimation on the target local block feature, so as to achieve the purpose of performing deep counterfeit detection on the face image.
The depth forgery detection method provided in this embodiment includes first performing feature extraction on a face image, then converting the extracted features into a second feature map including a plurality of local block features, further obtaining position weight information corresponding to the plurality of local block features, performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information to obtain a third feature map, and finally performing classification processing on the third feature map to obtain a detection result of the face image. According to the method, the high-level features in the face image are converted into local block features convenient to identify, the global correlation features of the local block features are extracted through attention enhancement processing, the attention degree of a forged region in the face region is effectively enhanced, and the accuracy of deep forging detection is improved.
In one embodiment, step 201 may be implemented by: feature extraction is carried out on the face image through a pre-trained Convolutional Neural Network (CNN) to obtain a first feature map. In this embodiment, the first feature map includes features of three dimensions, where the three dimensions are the height, width, and number of channels of the first feature map. According to the embodiment, the high-level features of the face image are extracted through the convolutional neural network, so that the extraction of the local features in the face image can be enhanced, and the accuracy of deep forgery detection of the face image is improved.
In one embodiment, step 202 may be implemented by: and converting the first characteristic diagram from the three-dimensional characteristic matrix into a two-dimensional characteristic matrix to obtain a second characteristic diagram. In this embodiment, the second feature map includes features of two dimensions in total, where the two dimensions are the local block feature number and the feature length of the second feature map, respectively, where the local block feature number of the second feature map is a product of the length and the width of the first feature map, and the feature length of the second feature map is the number of channels of the first feature map. In this embodiment, the feature of the first dimension and the feature of the second dimension of the first feature map are combined into the feature of the second feature map in the same dimension, so that the local feature of the face image can be converted into the local block feature, and the local block feature is abstracted into the time sequence feature to be processed, thereby facilitating the attention enhancement processing of the converted feature in the subsequent steps, so as to identify the target local block feature in each local block feature, and correspond the target local block feature to the fake feature in the face image.
In one embodiment, step 203 may be implemented by: firstly, according to position weight information of a plurality of local block features, setting a position coding feature for each local block feature, wherein the position coding feature is a feature vector with a preset length, then respectively splicing each local block feature and the position coding feature corresponding to each local block feature to obtain a plurality of cascade features of a second feature map, further setting a flag bit feature at the initial position of the cascade features, wherein the length of the flag bit feature is equal to the length of the cascade features, and finally performing attention enhancement processing on the flag bit feature of the second feature map and each cascade feature to obtain a third feature map. In the embodiment, a position coding feature is set for each local block feature, the local block features and the position coding features are spliced into a cascade feature, a flag bit feature is set at the starting position of a plurality of cascade features, and finally attention is given to the cascade features and the flag bit feature, so that the global correlation feature of each local block feature can be extracted, the purpose of enhancing a forged region in a face image is achieved, and the accuracy of deep forging detection is effectively improved.
In one embodiment, the step of performing attention enhancement processing on the local block features in step 203 may be implemented by: firstly, inputting the zone bit characteristics and all cascade characteristics of the second characteristic diagram into a pre-trained full-connection layer to obtain a query characteristic vector, an attribute characteristic vector and a content characteristic vector, then carrying out normalization processing on the product of the query characteristic vector and the transposed attribute characteristic vector to obtain an attention diagram, and finally multiplying the attention diagram and the content characteristic vector to obtain a third characteristic diagram. Specifically, the calculation formula for performing attention enhancement processing on the local block features is as follows:
Figure BDA0003440971890000071
where Q is the query feature vector, K is the attribute feature vector, and V is the content feature vector. In this embodiment, assuming that the second feature map after the position weight information fusion is a feature vector of (n +1) × (c + c '), softmax (QKT/sqrt (dk)) is a feature vector of (n +1) × (n +1), which means the correlation between (n +1) local block features, and a vector of (n +1) × (c + c') can be obtained by performing matrix multiplication on the attention map and V for subsequent classification processing. Specifically, according to the matrix multiplication calculation principle, the value of the (i, j) position can be obtained by performing weighted summation on the ith row of the attention map and the jth column of V, which is equivalent to that n +1 time sequences of the jth column of V are weighted by different weights, that is, the degrees of the weightings of different local blocks are different, and the weight of a forged region in the face image is relatively large, so that the method can enhance the attention to the forged region. In this embodiment, the flag bit features and the cascade features of the second feature map are converted into query feature vectors, attribute feature vectors, and content feature vectors, and the product of the query feature vectors and the transposed attribute feature vectors is normalized, so that global connection feature maps among local blocks can be obtained.
In one embodiment, step 204 may be implemented by: firstly, inputting a third feature map into a pre-trained Multilayer Perceptron (MLP) to obtain a depth forgery probability value of a face image, then comparing the depth forgery probability value of the face image with a preset probability threshold value, and finally obtaining a detection result of the face image according to the comparison result, namely, regarding the face image with the depth forgery probability value larger than the preset probability threshold value, the face image is considered to be formed by a depth forgery technology, so that the purpose of depth forgery detection is achieved. In the embodiment, the third feature map with the global relevance feature is input into the multilayer sensor, so that a more accurate detection result can be obtained, and the accuracy of counterfeit detection is further improved.
In one embodiment, prior to step 201, the depth forgery detection may further include the steps of: the method comprises the steps of firstly obtaining initial image data or initial video data, then identifying a face area in the image data through a face recognition algorithm, intercepting the face area to obtain a face image to be recognized, or identifying the face area in each frame of image in the video data through the face recognition algorithm, and intercepting the face area to obtain the face image to be recognized. In the embodiment, the face region is extracted from the original image or the original video through the face recognition algorithm to perform local image detection, so that the accuracy of depth forgery detection can be improved, the calculated amount of image processing is reduced, and the efficiency of depth forgery detection on the face image is improved.
Further, as a refinement and an expansion of the specific implementation of each of the above embodiments, in order to fully describe the implementation process of this embodiment, the following provides a further description of the deep forgery detection method provided in this application through specific examples.
According to the depth forgery detection method provided by the embodiment, the attention mechanism model transformer in the natural language processing field is applied to detection of depth forgery pictures or videos, and the local features of the face images are extracted by combining the convolutional neural network CNN, so that the accuracy of depth forgery detection can be effectively improved. Specifically, the method comprises the steps of firstly, locally extracting features of a face image by using CNN, then, applying transform on a feature level, abstracting each spatial feature into a time sequence, learning attention force drawing, and learning a forged area with emphasis on the forged area to finally obtain a forged detection result of the depth of the face image.
In this embodiment, the method mainly includes two modules, the first module extracts the high-level features of the face image through CNN, and the second module performs global correlation modeling on the high-level features by using a transformer. The specific implementation mode comprises the following steps:
step 1, training set data preparation. A large number of real face pictures and fake face pictures are collected as training sets, the real face data are widely available and can be from various open source face data sets and can also be collected on the Internet, and fake face pictures can be from the open source face data sets and the Internet and are generated by self-counterfeiting by utilizing an algorithm.
And 2, constructing a model. Two core modules of the model are a CNN local feature extraction module and a transform overall correlation feature extraction module respectively. The algorithm flow chart is shown in fig. 3.
Firstly, an input face image is subjected to extraction of local high-level features through CNN, generally speaking, the input face image is in a data format of H × W × 3, wherein H and W respectively represent the height and width of the input face image, three color channels of the image represented by 3 are represented, and the input image is marked as X. After the features are extracted from X by CNN, the data format of the obtained feature map is h × w × c, h and w respectively represent the height and width of the feature map, and c represents the number of channels of the feature map, and the feature is denoted as F1 (i.e., the first feature map). In order to further process the feature F1 through the transform, a format conversion operation may be performed on the feature, where the converted feature is denoted as F2 (i.e., a second feature map), and F2 ═ reshape (F1), where the feature format of F2 is n × c and n ═ h × w, where n corresponds to the time-series length in the natural language processing domain, and n corresponds to the number of local blocks in the visual domain. Subsequently, the transform can be used to further process the feature F2 to obtain the classification probability, and the method is to use the transform to perform global correlation feature extraction on all local blocks in the feature F2.
Further, as shown in fig. 4, for a feature F2 with an input data format of n × c, a position code is first added to each local block, the position code of each block is randomly initialized to a vector of c ', the position code is optimized by gradient back-pass in the subsequent training process, each block position code vector and the feature are concatenated together to form a vector with a length of (c + c'), and the position code is used to construct a relative position relationship between the local block and the local block. Meanwhile, in order to realize the function of subsequent classification, a position code and a feature of a position 0 are added, and the position code and the feature are randomly initialized into a vector with the length of (c + c'). After the above processing, a feature vector of (n +1) × (c + c') is obtained, and a calculation formula of the processed feature F2 is as follows:
Figure BDA0003440971890000101
further, the processed feature F2 may obtain 3 features, Q, K and V, through a full connection layer, where Q is a query feature vector, K is an attribute feature vector, V is a content feature vector, the feature sizes of Q, K, V are all (n +1) × (c + c'), and QK isTHas a size of (n +1) × (n +1), dkTo normalize the factors, the effect is to avoid excessive matrix product, typically dkThe global relation feature map is obtained by a softmax function, then the global relation feature map is multiplied by V, and the vector of (n +1) × (c + c') is obtained, so that the feature V has a reinforcing effect on a specific block according to the global relation feature map. In this embodiment, the weight of the self-learned result of the tranformer model to the forged area is relatively large, so that the attention to the forged area can be enhanced.
In the transform module, there are a plurality of the above-mentioned computation cascades, which constitute the transform encoder (i.e., the encoder of the attention model) shown in fig. 4. the output of the transform encoder is still a vector of (n +1) × (c + c '), which is marked as a feature F3 (i.e. a third feature map), the vector of the 0 th block of the feature F3 is taken, the length of the vector is (c + c'), classification probability values can be obtained after the vector passes through a multilayer perceptron, and the classification probability values are compared with a preset threshold value, so that the depth forgery detection result of the face image can be obtained.
According to the depth forgery detection method provided by the embodiment, the characteristics in the face image are extracted by using the CNN, the learning and extraction of the local characteristics in the face image can be effectively enhanced, the global correlation characteristics in the face image are learned by using a transformer, the attention degree of a forged area in the face image can be further enhanced, and the accuracy of depth forgery detection on the face image is improved.
Further, as a specific implementation of the method shown in fig. 1 to 4, the present embodiment provides a depth forgery detection apparatus, as shown in fig. 5, including: an input-output module 31 and a processing module 32.
The input/output module 31 may be configured to obtain a face image to be recognized;
the processing module 32 is configured to perform feature extraction on the face image to obtain a first feature map, and perform format conversion processing on the first feature map to obtain a second feature map, where the second feature map includes a plurality of local block features;
the input/output module 31 may also be configured to obtain position weight information corresponding to a plurality of local block features;
the processing module 32 is further configured to perform attention enhancement processing on the multiple local block features of the second feature map according to the position weight information corresponding to the multiple local block features to obtain a third feature map, and perform classification processing on the third feature map to obtain a detection result of the face image, where the detection result is used to indicate a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image;
the input/output module 31 may also be configured to output a detection result of the face image.
In a specific application scenario, the processing module 32 may be specifically configured to perform feature extraction on a face image through a pre-trained convolutional neural network to obtain a first feature map, where the first feature map includes features of three dimensions, and the three dimensions of the first feature map are height, width, and channel number of the first feature map respectively.
In a specific application scenario, the processing module 32 may be specifically configured to convert the first feature map from a three-dimensional feature matrix to a two-dimensional feature matrix to obtain a second feature map, where the second feature map includes features of two dimensions, and the two dimensions of the second feature map are the local block feature quantity and the feature length of the second feature map respectively.
In a specific application scenario, the processing module 32 is specifically configured to set a position coding feature for each local block feature according to position weight information of a plurality of local block features, where the position coding feature is a feature vector with a preset length; respectively splicing each local block feature and the position coding feature corresponding to each local block feature to obtain a plurality of cascade features of a second feature map; setting a flag bit feature at the starting position of the cascade features, wherein the length of the flag bit feature is equal to that of the cascade features; and performing attention enhancement processing on the zone bit characteristics and the cascade characteristics of the second characteristic diagram to obtain a third characteristic diagram.
In a specific application scenario, the processing module 32 may be specifically configured to input the flag bit features and the cascade features of the second feature map into a pre-trained full-link layer to obtain a query feature vector, an attribute feature vector, and a content feature vector; normalizing the product of the query feature vector and the transposed attribute feature vector to obtain an attention diagram; and multiplying the attention diagram by the content feature vector to obtain a third feature diagram.
In a specific application scenario, the processing module 32 may be specifically configured to input the third feature map into a pre-trained multi-layer perceptron, so as to obtain a depth forgery probability value of the face image; and comparing the depth forgery probability value of the face image with a preset probability threshold value, and obtaining the detection result of the face image according to the comparison result.
In a specific application scenario, the input/output module 31 may be further configured to obtain initial image data or initial video data; the processing module 32 may also be configured to identify a face region in the image data through a face recognition algorithm, and perform an intercepting operation on the face region to obtain a face image to be identified; or identifying the face region in each frame of image in the video data through a face identification algorithm, and intercepting the face region to obtain the face image to be identified.
It should be noted that other corresponding descriptions of the functional units related to the deep forgery detection apparatus provided by this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 4, and are not repeated herein.
Based on the methods shown in fig. 1 to 4, correspondingly, the present embodiment further provides a storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the depth forgery detection method shown in fig. 1 to 4.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, and the software product to be identified may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and include several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the implementation scenarios of the present application.
Based on the method shown in fig. 1 to 4 and the embodiment of the depth forgery detection apparatus shown in fig. 5, in order to achieve the above object, as shown in fig. 6, the present embodiment further provides a computer device for depth forgery detection, which may be specifically a personal computer, a server, a smart phone, a tablet computer, a smart watch, or other network devices, and the computer device includes a storage medium and a processor; a storage medium for storing a computer program and an operating system; a processor for executing a computer program for implementing the above-described method as shown in fig. 1 to 4.
Optionally, the computer device may further include an internal memory, a communication interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, a Display (Display), an input device such as a Keyboard (Keyboard), and the like, and optionally, the communication interface may further include a USB interface, a card reader interface, and the like. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
It will be understood by those skilled in the art that the present embodiment provides a computer device structure for deep forgery detection, which does not constitute a limitation of the computer device, and may include more or less components, or combine some components, or arrange different components.
The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware of the above-described computer device and the software resources to be identified, and supports the execution of the information processing program and other software and/or programs to be identified. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing computer equipment.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. By applying the technical scheme of the application, firstly, feature extraction is carried out on the face image, then the extracted features are converted into a second feature map containing a plurality of local block features, further position weight information corresponding to the plurality of local block features is obtained, attention enhancement processing is carried out on the plurality of local block features of the second feature map according to the position weight information, a third feature map is obtained, finally, classification processing is carried out on the third feature map, a detection result of the face image is obtained, and forged features in the face image are indicated by using target local block features in the detection result. Compared with the prior art, the method can effectively enhance the attention degree of the forged area in the face area, thereby improving the accuracy of deep forging detection.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (10)

1. A method of depth forgery detection, the method comprising:
acquiring a face image to be recognized, and extracting the features of the face image to obtain a first feature map;
carrying out format conversion processing on the first feature map to obtain a second feature map, wherein the second feature map comprises a plurality of local block features;
acquiring position weight information corresponding to the local block features, and performing attention enhancement processing on the local block features of the second feature map according to the position weight information corresponding to the local block features to obtain a third feature map;
and classifying the third feature map to obtain a detection result of the face image, wherein the detection result is used for indicating a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image.
2. The method according to claim 1, wherein the extracting the features of the face image to obtain a first feature map comprises:
and performing feature extraction on the face image through a pre-trained convolutional neural network to obtain a first feature map, wherein the first feature map comprises features of three dimensions, and the three dimensions of the first feature map are respectively the height, the width and the channel number of the first feature map.
3. The method according to claim 2, wherein the performing format conversion processing on the first feature map to obtain a second feature map comprises:
and converting the first feature map from a three-dimensional feature matrix to a two-dimensional feature matrix to obtain a second feature map, wherein the second feature map comprises features of two dimensions, and the two dimensions of the second feature map are the local block feature quantity and the feature length of the second feature map respectively.
4. The method according to any one of claims 1 to 3, wherein the performing attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map comprises:
setting a position coding feature for each local block feature according to the position weight information of the local block features, wherein the position coding feature is a feature vector with a preset length;
respectively splicing each local block feature and the position coding feature corresponding to each local block feature to obtain a plurality of cascade features of the second feature map;
setting a flag bit feature at the starting position of the cascade features, wherein the length of the flag bit feature is equal to that of the cascade features;
and performing attention enhancement processing on the zone bit characteristics and the cascade characteristics of the second characteristic diagram to obtain a third characteristic diagram.
5. The method according to claim 4, wherein the performing attention-enhancing processing on the flag bit features and the respective cascade features of the second feature map to obtain a third feature map comprises:
inputting the zone bit characteristics and the cascade characteristics of the second characteristic diagram into a pre-trained full-connection layer to obtain a query characteristic vector, an attribute characteristic vector and a content characteristic vector;
normalizing the product of the query feature vector and the converted attribute feature vector to obtain an attention diagram;
and multiplying the attention diagram and the content feature vector to obtain the third feature diagram.
6. The method according to claim 1, wherein the classifying the third feature map to obtain the detection result of the face image comprises:
inputting the third feature map into a pre-trained multilayer perceptron to obtain a depth forgery probability value of the face image;
and comparing the depth forgery probability value of the face image with a preset probability threshold value, and obtaining the detection result of the face image according to the comparison result.
7. The method according to claim 1, wherein the acquiring the face image to be recognized comprises:
acquiring initial image data, identifying a face area in the image data through a face identification algorithm, and intercepting the face area to obtain a face image to be identified; or
Acquiring initial video data, identifying a face area in each frame of image in the video data through a face identification algorithm, and intercepting the face area to obtain the face image to be identified.
8. A depth forgery detection apparatus, characterized in that said apparatus comprises:
the input and output module is used for acquiring a face image to be recognized;
the processing module is used for extracting the features of the face image to obtain a first feature map and performing format conversion processing on the first feature map to obtain a second feature map, wherein the second feature map comprises a plurality of local block features;
the input and output module is used for acquiring position weight information corresponding to the local block characteristics;
the processing module is further configured to perform attention enhancement processing on the plurality of local block features of the second feature map according to the position weight information corresponding to the plurality of local block features to obtain a third feature map, and perform classification processing on the third feature map to obtain a detection result of the face image, where the detection result is used to indicate a target local block feature in each local block feature in the third feature map, and the target local block feature corresponds to a forged feature in the face image;
the input and output module is further used for outputting the detection result of the face image.
9. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by the processor.
CN202111633779.7A 2021-12-28 2021-12-28 Deep forgery detection method, device, storage medium and computer equipment Active CN114463805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111633779.7A CN114463805B (en) 2021-12-28 2021-12-28 Deep forgery detection method, device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111633779.7A CN114463805B (en) 2021-12-28 2021-12-28 Deep forgery detection method, device, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN114463805A true CN114463805A (en) 2022-05-10
CN114463805B CN114463805B (en) 2022-11-15

Family

ID=81406740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111633779.7A Active CN114463805B (en) 2021-12-28 2021-12-28 Deep forgery detection method, device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN114463805B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115001814A (en) * 2022-05-31 2022-09-02 山西西电信息技术研究院有限公司 Machine learning-based security audit method and system
WO2024104068A1 (en) * 2022-11-15 2024-05-23 腾讯科技(深圳)有限公司 Video detection method and apparatus, device, storage medium, and product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100883519B1 (en) * 2007-08-27 2009-02-13 한국전자통신연구원 Method and system for analyzing face recognition failure by using image analysis
US20190139191A1 (en) * 2017-11-09 2019-05-09 Boe Technology Group Co., Ltd. Image processing methods and image processing devices
CN112287891A (en) * 2020-11-23 2021-01-29 福州大学 Method for evaluating learning concentration through video based on expression and behavior feature extraction
CN113536990A (en) * 2021-06-29 2021-10-22 复旦大学 Deep fake face data identification method
CN113537027A (en) * 2021-07-09 2021-10-22 中国科学院计算技术研究所 Face depth forgery detection method and system based on facial segmentation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100883519B1 (en) * 2007-08-27 2009-02-13 한국전자통신연구원 Method and system for analyzing face recognition failure by using image analysis
US20190139191A1 (en) * 2017-11-09 2019-05-09 Boe Technology Group Co., Ltd. Image processing methods and image processing devices
CN112287891A (en) * 2020-11-23 2021-01-29 福州大学 Method for evaluating learning concentration through video based on expression and behavior feature extraction
CN113536990A (en) * 2021-06-29 2021-10-22 复旦大学 Deep fake face data identification method
CN113537027A (en) * 2021-07-09 2021-10-22 中国科学院计算技术研究所 Face depth forgery detection method and system based on facial segmentation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BACHIR KADDAR等: "HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer", 《2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)》 *
CHANGTAO MIAO等: "TOWARDS GENERALIZABLE AND ROBUST FACE MANIPULATION DETECTION VIA BAG-OF-LOCAL-FEATURE", 《ARXIV:2103.07915V1 [CS.CV]》 *
赵宝奇等: "结合密集连接块和自注意力机制的腺体细胞分割方法", 《计算机辅助设计与图形学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115001814A (en) * 2022-05-31 2022-09-02 山西西电信息技术研究院有限公司 Machine learning-based security audit method and system
WO2024104068A1 (en) * 2022-11-15 2024-05-23 腾讯科技(深圳)有限公司 Video detection method and apparatus, device, storage medium, and product

Also Published As

Publication number Publication date
CN114463805B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
Gao et al. MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation
CN111324774B (en) Video duplicate removal method and device
Gao et al. Hierarchical deep CNN feature set-based representation learning for robust cross-resolution face recognition
CN114463805B (en) Deep forgery detection method, device, storage medium and computer equipment
CN113111842B (en) Action recognition method, device, equipment and computer readable storage medium
Xu et al. Aligning correlation information for domain adaptation in action recognition
CN113449700B (en) Training of video classification model, video classification method, device, equipment and medium
CN114241459B (en) Driver identity verification method and device, computer equipment and storage medium
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN112085120B (en) Multimedia data processing method and device, electronic equipment and storage medium
Naik et al. Video classification using 3D convolutional neural network
CN114638994B (en) Multi-modal image classification system and method based on attention multi-interaction network
CN115131698A (en) Video attribute determination method, device, equipment and storage medium
CN115905605A (en) Data processing method, data processing equipment and computer readable storage medium
CN113822134A (en) Instance tracking method, device, equipment and storage medium based on video
CN111639537A (en) Face action unit identification method and device, electronic equipment and storage medium
Muthukumar et al. Vision based hand gesture recognition for Indian sign languages using local binary patterns with support vector machine classifier
Usmani et al. Efficient deepfake detection using shallow vision transformer
CN117011416A (en) Image processing method, device, equipment, medium and program product
CN115222047A (en) Model training method, device, equipment and storage medium
Zhang et al. Action Recognition by Joint Spatial‐Temporal Motion Feature
CN113569094A (en) Video recommendation method and device, electronic equipment and storage medium
CN115439922A (en) Object behavior identification method, device, equipment and medium
Tsai et al. A deep neural network for hand gesture recognition from RGB image in complex background
US20240135576A1 (en) Three-Dimensional Object Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant