CN117275068A - Method and system for training human face fake detection in test stage containing uncertainty guidance - Google Patents
Method and system for training human face fake detection in test stage containing uncertainty guidance Download PDFInfo
- Publication number
- CN117275068A CN117275068A CN202311224982.8A CN202311224982A CN117275068A CN 117275068 A CN117275068 A CN 117275068A CN 202311224982 A CN202311224982 A CN 202311224982A CN 117275068 A CN117275068 A CN 117275068A
- Authority
- CN
- China
- Prior art keywords
- features
- image
- attention
- frequency domain
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 title claims abstract description 24
- 238000012360 testing method Methods 0.000 title claims abstract description 22
- 238000001514 detection method Methods 0.000 title claims abstract description 21
- 241000282414 Homo sapiens Species 0.000 title description 3
- 230000004927 fusion Effects 0.000 claims abstract description 52
- 238000004364 calculation method Methods 0.000 claims abstract description 34
- 238000000605 extraction Methods 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 14
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 10
- 238000007500 overflow downdraw method Methods 0.000 claims description 6
- 230000009977 dual effect Effects 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 230000009466 transformation Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000004913 activation Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
- G06V40/45—Detection of the body part being alive
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Medical Informatics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a training face counterfeiting detection method and system in a test stage containing uncertainty guidance, belonging to the technical field of deep learning and computer vision, wherein the method comprises the following steps: acquiring an image to be distinguished as an initial input image; acquiring a high-frequency information image of the initial input image; extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features; performing cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features; based on the fusion characteristics, a fusion mode is selected in a self-adaptive mode according to different input images and task requirements, so that discrimination characteristics are obtained, and classification tasks are performed based on the discrimination characteristics. The invention fully utilizes the effective information in the frequency domain and the RGB domain to mine the fake trace, optimizes the uncertainty in the network by utilizing the uncertainty-guided test stage training strategy, and improves the generalization performance.
Description
Technical Field
The invention belongs to the technical field of deep learning and computer vision, and particularly relates to a face fake detection method and system for training in a test stage containing uncertainty guidance.
Background
Face-forgery detection is a key technology for detecting face forgery by various means. With the popularization and practicality of face recognition technology, the situation of face counterfeiting is also increasing, which threatens the benefits of personal privacy, social security, legal fairness and the like, including the application of face recognition technology, financial transaction security, medical care protection and the like. In the application of face recognition technology, if face forgery exists, false security records and access authorization can be caused; in financial transactions, if the phenomenon of face counterfeiting exists, account security is threatened, and funds loss can be caused; in healthcare protection, face falsification may cause leakage and alteration of medical record information, and may also cause problems such as infringement of medical privacy of a patient. With concerns about these negative effects, researchers have begun to explore various means to address the problem of face counterfeiting. A number of face-counterfeit detectors have been invented, for example, using texture features for detection, using face motion features, using deep learning techniques, etc. The technologies play a positive role in face counterfeiting detection and other related application fields, are hopeful to help human beings to better protect personal privacy, social security, legal fairness and other benefits, and enable life of people to be more convenient, convenient and safe.
Face-forgery detection is seen in early work as a binary classification problem, and researchers aim to learn decision boundaries between genuine and fake faces. However, with the continuous development of counterfeiting technology, such methods gradually lose their performance. Therefore, many recent works have been shifted to finding false clues from the frequency domain, and discriminating the authenticity of a face based on the fine false clues. Some researchers have proposed a similarity model using frequency characteristics to improve the performance of the model in the invisible domain, others have assumed that high frequency noise of the image can remove color textures and mine forgery marks, and use image noise to improve generalization ability. However, there are still non-negligible problems: the effect of frequency is not always sufficiently efficient and adaptable to different counterfeiting techniques; a network trained on a common data set cannot effectively quantify its uncertainty.
Disclosure of Invention
The invention aims to provide a training face fake detection method in a test stage containing uncertainty guidance, which can further excavate fake marks in a frequency domain, can fuse frequency domains and RGB information with different qualities, optimizes uncertainty in a network and further improves generalization performance of the face fake detection method.
In order to achieve the above object, the present invention provides a face falsification detection method for training in a test phase including uncertainty guidance, comprising the steps of:
step S100, an image to be distinguished is obtained as an initial input image;
step S200, acquiring a high-frequency information image of the initial input image, wherein the high-frequency information image is image information positioned in a high-frequency band in the initial input image;
step S300, extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features;
step S400, performing cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, where the cross attention calculation is also called cross-attention, and refers to performing attention calculation on a certain position in one feature sequence and all positions in another feature sequence in an attention mechanism;
step S500, based on the fusion characteristics, adaptively selecting a fusion mode according to different input images and task requirements to obtain discrimination characteristics, and classifying tasks based on the discrimination characteristics.
Further, in the step S200, the initial input image is converted from the spatial domain to the frequency domain by using discrete cosine transform, and the high-frequency information image is screened out.
Further, in the step S300, RGB features and frequency domain attention features are extracted based on a self-attention mechanism and global information of an input sequence in the high frequency information image.
Further, in the step S400, a dual attention mechanism based on channel attention and spatial attention is used to interact information between the fused RGB features and the frequency domain features.
Further, in the step S500, the fusion manner includes a dynamic weighted average fusion manner with uncertainty factors, and the weights of the feature graphs in the fusion features are adaptively adjusted by the dynamic weighted average fusion manner.
The invention also provides a training face counterfeiting detection system containing uncertainty guidance in the test stage, which comprises the following steps:
an image acquisition unit for acquiring an image to be discriminated as an initial input image;
a high-frequency conversion unit, configured to obtain a high-frequency information image of the initial input image, where the high-frequency information image is image information located in a high-frequency band in the initial input image;
the feature extraction and fusion unit is used for extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features;
the feature calculation unit is used for carrying out cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, wherein the cross attention calculation is also called cross-attention, and refers to that in an attention mechanism, attention calculation is carried out on a certain position in one feature sequence and all positions in the other feature sequence;
and the feature judging unit is used for adaptively selecting a fusion mode based on the fusion features according to different input images and task requirements to obtain judging features and classifying tasks based on the judging features.
Further, the high-frequency conversion unit comprises a plurality of converter modules, the plurality of converter modules are divided into three groups, and the three groups of converter modules are sequentially connected in series from high to low according to the corresponding spatial resolution.
Further, the feature extraction fusion unit further comprises an image feature enhancement processing module, wherein the image feature enhancement processing module is used for extracting a frequency domain image attention map from the frequency domain initial input image through a convolution encoder connected in series, then combining the frequency domain image attention map with the RGB initial input image, sending the frequency domain image attention map to a multi-stage feature conversion extractor, and repeating the operation after each feature conversion extractor to obtain frequency domain image features and frequency domain enhanced RGB image features.
The present invention also provides an electronic device including: one or more processors; a storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement the methods described above.
The present invention also provides a computer readable medium having stored thereon a computer program which when executed by a processor implements the method described above.
Compared with the prior art, according to the training face counterfeiting detection method and system for the testing stage containing uncertainty guidance, the RGB features and the frequency domain attention features are fused, the cross attention calculation is carried out on the fused RGB features and the frequency domain features to obtain the fusion features, the frequency domain information and the RGB information with different qualities can be fused, the fusion mode is adaptively selected based on the fusion features according to different input images and task requirements, the discrimination features are obtained, and the classification task is carried out based on the discrimination features. The method can further excavate fake trace in the frequency domain, optimize uncertainty in the network by utilizing an uncertainty-guided test stage training strategy, and further improve generalization performance of the method.
Drawings
FIG. 1 is a flow chart of a training face falsification detection method for a test phase with uncertainty guidance in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a feature extraction fusion unit according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a frequency domain feature enhancement network based on a converter module in an embodiment of the invention;
FIG. 4 is a schematic diagram of a frequency domain attention computation step in an embodiment of the present invention;
FIG. 5 is a schematic diagram of the cross domain attention calculation step in an embodiment of the present invention;
FIG. 6 is a schematic diagram of a dynamic fusion module in an embodiment of the invention;
FIG. 7 is a schematic diagram of a strategy for test phase training with uncertainty guidance in one embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.
It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.
It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a flowchart of a method for detecting face falsification during test phase training with uncertainty guidance, and fig. 7 is a schematic diagram of a strategy for training during test phase with uncertainty guidance according to an embodiment of the present invention, and a method for detecting face falsification during test phase training with uncertainty guidance according to a preferred embodiment of the present invention includes the following steps:
step S100, an image to be distinguished is obtained as an initial input image;
step S200, obtaining a high-frequency information image of an initial input image, wherein the high-frequency information image is image information positioned in a high-frequency band in the initial input image, converting the initial input image from a space domain to a frequency domain by adopting Discrete Cosine Transform (DCT), and screening out the image information of the high-frequency band as the high-frequency information image;
step S300, extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, as shown in FIG. 4, which is a schematic diagram of a frequency domain attention calculation step in an embodiment of the invention, by fusing the RGB features and the frequency domain attention features, obtaining fused RGB features and frequency domain features;
step S400, performing cross attention calculation on the RGB features and the frequency domain features after fusion to obtain fusion features, wherein the cross attention calculation is also called cross-attention, namely, in an attention mechanism, performing attention calculation on a certain position in one feature sequence and all positions in the other feature sequence, and obtaining more accurate fusion features through the cross attention calculation;
step S500, based on the fusion characteristics, adaptively selecting a fusion mode according to different input images and task requirements to obtain discrimination characteristics, and classifying tasks based on the discrimination characteristics.
In one embodiment of the present invention, in step S200, the initial input image is transformed from the spatial domain to the frequency domain using the discrete cosine transform, and the high frequency information image is screened out.
In an embodiment of the present invention, in step S300, RGB features and frequency domain attention features are extracted based on the self-attention mechanism and global information of the input sequence in the high frequency information image.
In an embodiment of the present invention, in step S400, a dual attention mechanism based on channel attention and spatial attention is used to interact information between the fused RGB features and the frequency domain features, so as to further improve the expression capability of the fused features.
In an embodiment of the present invention, in step S500, the fusion method includes a dynamic weighted average fusion method with uncertainty factors, and the weights of the feature graphs in the fusion features are adaptively adjusted by the dynamic weighted average fusion method.
The invention also provides a training face counterfeiting detection system containing uncertainty guidance in the test stage, which comprises the following steps:
an image acquisition unit for acquiring an image to be discriminated as an initial input image;
the high-frequency conversion unit is used for acquiring a high-frequency information image of the initial input image, wherein the high-frequency information image is image information positioned in a high-frequency band in the initial input image;
the feature extraction fusion unit, as shown in fig. 2, is a schematic structural diagram of the feature extraction fusion unit in an embodiment of the present invention, where the feature extraction fusion unit is configured to extract RGB features and frequency domain attention features of different scales in a high-frequency information image, and fuse the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features; the feature extraction fusion unit comprises a frequency domain attention module, wherein the frequency domain attention module adopts a window-based self-attention calculation module, and can realize information interaction between frequency domain features and utilize the information interaction;
the feature calculation unit is used for carrying out cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, wherein the cross attention calculation is also called cross-attention, and refers to that in an attention mechanism, attention calculation is carried out on a certain position in one feature sequence and all positions in the other feature sequence;
the feature discriminating unit is used for adaptively selecting a fusion mode based on the fusion features according to different input images and task requirements to obtain discriminating features and classifying tasks based on the discriminating features. The feature discriminating unit comprises a dynamic fusion module, the dynamic fusion module carries out dynamic weighted average in a mode with uncertainty factors, and the weight of each feature map in fusion features is adaptively adjusted, so that the image discriminating capability is further improved, the uncertainty factors are arranged in the dynamic fusion module, loss calculation is carried out on the output discriminating features, fine adjustment is carried out on the feature discriminating unit, and only the parameters of the dynamic fusion module are updated in an unsupervised training mode.
In one embodiment of the present invention, as shown in fig. 6, the dynamic fusion module further uses two linear layers in order to obtain the corresponding weight of each branchAnd->The global averaging pooling layer GAP and the activation layer Gelu function δ integrate the features of the three branches, which can be expressed as:
wherein the method comprises the steps of,/>. Then we set up three linear layers/>,/>And->And a softmax function generating a quality weight for each branch>Can be expressed as:
wherein the method comprises the steps ofRepresenting the quality of each branch. Because the contributions of different branches to mining spurious cues are different, the fusion features are weighted according to quality and two linear layers are used to recover the channel dimensions of the dynamic fusion features. Output->Can be expressed as:
in one embodiment of the present invention, the high frequency conversion unit includes a plurality of converter modules, the plurality of converter modules are divided into three groups, and the three groups of converter modules are serially connected in sequence from high to low according to the corresponding spatial resolution. The three groups of converter modules respectively and correspondingly process the image features with different spatial resolutions, and combine the RGB features with the frequency domain attention features to obtain updated RGB features. The combination process of the RGB features and the frequency domain attention features utilizes the frequency domain attention module to model the interdependence relationship between RGB and high-frequency information images, so that the model can extract the image features more accurately.
In one embodiment of the invention, the input image is passed through the high frequency conversion unit to obtain the high frequency image input in accordance with the Discrete Cosine Transform (DCT) in which the low frequency band is the first 1/16 of the spectrum, the middle frequency band is between 1/16 and 1/8 of the spectrum, and the high frequency band is the last 7/8 of the spectrum, in order to enhance high frequency fine artifacts, the low and middle frequency information is filtered by setting them to 0.
In an embodiment of the present invention, the feature extraction and fusion unit further includes an image feature enhancement processing module, as shown in fig. 3, where the image feature enhancement processing module is configured to extract a frequency domain image attention map from a frequency domain initial input image through a convolutional encoder connected in series, then combine the frequency domain image attention map with an RGB initial input image, send the frequency domain image attention map to a multi-stage feature conversion extractor, and repeat the operation after each feature conversion extractor to obtain a frequency domain image feature and a frequency domain enhanced RGB image feature. Wherein the convolutional encoder comprises a convolutional layer and a nonlinear activation layer, the convolutional encoder is configured to map a frequency-domain image input to a high-dimensional feature domain, extract shallow-layer frequency-domain feature representations through sets of basis modules consisting of the convolutional layer and the nonlinear activation layer, and generate a frequency-domain image attention-map, which may be obtained by inputting frequency-domain image features into a set of convolutional decoders consisting of the convolutional layer and the nonlinear activation layer.
In an embodiment of the present invention, a method for processing an input shallow image feature by an image feature enhancement processing module includes: image RGBAnd frequency domain image->As input to the transducer network, the RGB features are extracted via a transducer block>And frequency domain features->Frequency feature acquisition using frequency domain attention moduleIn an effort to direct the forgery trace of RGB modalities from a frequency perspective. The frequency domain attention module is shown in fig. 4, and the calculation process is as follows:
wherein the method comprises the steps ofRepresenting the frequency characteristics after feature extraction, σ represents Sigmoid function, and GAP and GMP represent global average pooling and global maximum pooling, respectively. CAT denotes connecting features in the depth direction. We finally select a 7 x 7 convolution kernel to extract the counterfeit trace in the frequency domain because it detects edge information better and covers a larger area than three 3 x 3 convolution kernels. Attention seeking to->Subtle forgery marks in the frequency domain that are difficult to dig out in RGB features are contained. Therefore, we will->Application to RGB features->In order to further excavate the fake trace, the calculation process is as follows:
wherein->Representing summation(s)>Representing element-wise multiplication. In addition, the feature extraction process has three stages, low, medium and high. The low-level features represent texture counterfeit information, while the high-level features extract more of the overall counterfeit trace. Thus, interacting with RGB features and frequency features at multiple levels to obtain a more comprehensive representation of counterfeit features. Specifically, the firstiFrequency domain output of individual phases->Is used as +.>First, thei+1Input of stage->And RGB input->RGB features, which are pre-guided in the frequency domain, can be expressed as:
the final stage is then output with featuresAnd->Input into a dynamic fusion module to mine more discriminative information, whereinh、wAndcis the dimension of the output feature.
In an embodiment of the present invention, in the multi-stage feature transformation extractor, the feature transformation extractor of each stage shares the network weight, the output of the feature extractor of the previous stage is used as the input of the feature extractor of the current stage, the image features loop through the multi-stage feature transformation extractor a plurality of times, and the feature transformation extractor of each stage is configured to pass the input RGB and frequency domain image features through the window self-attention calculation unit and the corresponding forward propagation calculation unit, so as to realize the information interaction between the image features of different positions.
In an embodiment of the present invention, the feature extraction fusion unit further includes a convolution decoder, where the convolution decoder is configured to input the extracted deep image feature expression into a series of superimposed basic convolution layers and upsampling layers, and perform pixel-level addition operation on the shallow image feature and the deep image feature that are connected in a skip manner, so as to obtain an RGB image feature after frequency domain enhancement, and further obtain an enhanced RGB image.
In some preferred embodiments, the multi-stage feature transformation extractor configuration further comprises a time-interleaved attention computation module, the feature transformation extractor of each stage combining the feature extraction processes of the self-attention mechanism computation unit and the forward propagation computation unit, and employing a cross-layer connection to pixel-level additive connect the input and output of the feature extractor. The time cross-attention calculation module is shown in fig. 5, and in particular, the frequency modality should be an auxiliary component. Given RGB featuresAnd frequency characteristics->We use the query-key-value mechanism to fuse them initially, fusing them into a unified representation, which can be expressed as:
wherein,Q,K,Vthree different matrix transforms are shown for projective transformation of the input features.h、wAndcthe dimension of the output feature represents the dimension of the feature.
The embodiment of the invention adds a dynamic fusion module of uncertainty factors, wherein the dynamic fusion module adds uncertainty factors such as Gumbel-Softmax function applied with random disturbance to generate some disturbance to the relative quality of a judging image, and dynamically selects a prediction result as a model to be output, thereby finely adjusting a network and increasing the fitting capability of the network to unknown data.
The invention also provides an electronic device, characterized by comprising: one or more processors; a storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the methods described above.
The present invention also provides a computer-readable medium having stored thereon a computer program characterized in that: the program is executed by a processor to implement the method described above.
According to the face fake detection method and system for training in the test stage with uncertainty guidance, the RGB features and the frequency domain attention features are fused, the fused RGB features and the frequency domain features are subjected to cross attention calculation to obtain the fusion features, frequency domains with different qualities and RGB information can be fused, fusion modes are adaptively selected based on the fusion features according to different input images and task requirements, discrimination features are obtained, and classification tasks are performed based on the discrimination features. The method can further excavate fake trace in the frequency domain, optimize uncertainty in the network by utilizing an uncertainty-guided test stage training strategy, and further improve generalization performance of the method.
It can be understood that the training face counterfeit detection system based on the uncertainty guidance provided in the foregoing embodiment is only exemplified by the division of the foregoing functional modules, in practical application, the foregoing functional allocation may be completed by different functional modules according to needs, that is, the modules or steps in the foregoing embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into a plurality of sub-modules to complete all or part of the functions described above.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.
Claims (10)
1. The method for detecting the face forgery by training in the test stage containing uncertainty guidance is characterized by comprising the following steps of:
step S100, an image to be distinguished is obtained as an initial input image;
step S200, acquiring a high-frequency information image of the initial input image, wherein the high-frequency information image is image information positioned in a high-frequency band in the initial input image;
step S300, extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features;
step S400, performing cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, where the cross attention calculation is also called cross-attention, and refers to performing attention calculation on a certain position in one feature sequence and all positions in another feature sequence in an attention mechanism;
step S500, based on the fusion characteristics, adaptively selecting a fusion mode according to different input images and task requirements to obtain discrimination characteristics, and classifying tasks based on the discrimination characteristics.
2. The method according to claim 1, wherein in step S200, the initial input image is converted from a space domain to a frequency domain by discrete cosine transform, and the high-frequency information image is screened out.
3. The uncertainty-guided test phase-training face-forgery detection method according to claim 1, wherein in the step S300, RGB features and frequency-domain attention features are extracted based on a self-attention mechanism and global information of an input sequence in the high-frequency information image.
4. The method according to claim 1, wherein in step S400, a dual attention mechanism based on channel attention and spatial attention is used to interact information between the fused RGB features and the frequency domain features.
5. The method according to claim 1, wherein in step S500, the fusion method includes a dynamic weighted average fusion method with uncertainty factors, and the weights of the feature maps in the fusion features are adaptively adjusted by the dynamic weighted average fusion method.
6. A training face-forgery detection system for a test phase with uncertainty guidance, comprising:
an image acquisition unit for acquiring an image to be discriminated as an initial input image;
a high-frequency conversion unit, configured to obtain a high-frequency information image of the initial input image, where the high-frequency information image is image information located in a high-frequency band in the initial input image;
the feature extraction and fusion unit is used for extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features;
the feature calculation unit is used for carrying out cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, wherein the cross attention calculation is also called cross-attention, and refers to that in an attention mechanism, attention calculation is carried out on a certain position in one feature sequence and all positions in the other feature sequence;
and the feature judging unit is used for adaptively selecting a fusion mode based on the fusion features according to different input images and task requirements to obtain judging features and classifying tasks based on the judging features.
7. The uncertainty-guided testing phase-training face-forgery detection system of claim 6, wherein the high-frequency conversion unit includes a number of converter modules that are divided into three groups that are serially connected in sequence from high to low according to the corresponding spatial resolution.
8. The system according to claim 6, wherein the feature extraction and fusion unit further comprises an image feature enhancement processing module, the image feature enhancement processing module is configured to extract a frequency domain image attention map from a frequency domain initial input image through a convolutional encoder connected in series, combine the frequency domain image attention map with an RGB initial input image, send the frequency domain image attention map to a multi-stage feature conversion extractor, and obtain frequency domain image features and frequency domain enhanced RGB image features after the processing by the feature conversion extractor.
9. An electronic device, comprising: one or more processors; a storage means for storing one or more programs; when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.
10. A computer readable medium having a computer program stored thereon, characterized by: the program, when executed by a processor, implements the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311224982.8A CN117275068B (en) | 2023-09-21 | 2023-09-21 | Method and system for training human face fake detection in test stage containing uncertainty guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311224982.8A CN117275068B (en) | 2023-09-21 | 2023-09-21 | Method and system for training human face fake detection in test stage containing uncertainty guidance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117275068A true CN117275068A (en) | 2023-12-22 |
CN117275068B CN117275068B (en) | 2024-05-17 |
Family
ID=89202101
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311224982.8A Active CN117275068B (en) | 2023-09-21 | 2023-09-21 | Method and system for training human face fake detection in test stage containing uncertainty guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117275068B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200402223A1 (en) * | 2019-06-24 | 2020-12-24 | Insurance Services Office, Inc. | Machine Learning Systems and Methods for Improved Localization of Image Forgery |
CN113536990A (en) * | 2021-06-29 | 2021-10-22 | 复旦大学 | Deep fake face data identification method |
CN114495245A (en) * | 2022-04-08 | 2022-05-13 | 北京中科闻歌科技股份有限公司 | Face counterfeit image identification method, device, equipment and medium |
CN114898432A (en) * | 2022-05-17 | 2022-08-12 | 中南大学 | Fake face video detection method and system based on multi-feature fusion |
CN115147895A (en) * | 2022-06-16 | 2022-10-04 | 北京百度网讯科技有限公司 | Face counterfeit discrimination method and device and computer program product |
CN115393760A (en) * | 2022-08-16 | 2022-11-25 | 公安部物证鉴定中心 | Method, system and equipment for detecting Deepfake composite video |
CN115601820A (en) * | 2022-12-01 | 2023-01-13 | 思腾合力(天津)科技有限公司(Cn) | Face fake image detection method, device, terminal and storage medium |
US20230081645A1 (en) * | 2021-01-28 | 2023-03-16 | Tencent Technology (Shenzhen) Company Limited | Detecting forged facial images using frequency domain information and local correlation |
CN115880749A (en) * | 2022-11-08 | 2023-03-31 | 杭州中科睿鉴科技有限公司 | Face deep false detection method based on multi-mode feature fusion |
CN115909445A (en) * | 2022-11-11 | 2023-04-04 | 中国人民解放军国防科技大学 | Face image counterfeiting detection method and related equipment |
CN116434351A (en) * | 2023-04-23 | 2023-07-14 | 厦门大学 | Fake face detection method, medium and equipment based on frequency attention feature fusion |
-
2023
- 2023-09-21 CN CN202311224982.8A patent/CN117275068B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200402223A1 (en) * | 2019-06-24 | 2020-12-24 | Insurance Services Office, Inc. | Machine Learning Systems and Methods for Improved Localization of Image Forgery |
US20230081645A1 (en) * | 2021-01-28 | 2023-03-16 | Tencent Technology (Shenzhen) Company Limited | Detecting forged facial images using frequency domain information and local correlation |
CN113536990A (en) * | 2021-06-29 | 2021-10-22 | 复旦大学 | Deep fake face data identification method |
CN114495245A (en) * | 2022-04-08 | 2022-05-13 | 北京中科闻歌科技股份有限公司 | Face counterfeit image identification method, device, equipment and medium |
CN114898432A (en) * | 2022-05-17 | 2022-08-12 | 中南大学 | Fake face video detection method and system based on multi-feature fusion |
CN115147895A (en) * | 2022-06-16 | 2022-10-04 | 北京百度网讯科技有限公司 | Face counterfeit discrimination method and device and computer program product |
CN115393760A (en) * | 2022-08-16 | 2022-11-25 | 公安部物证鉴定中心 | Method, system and equipment for detecting Deepfake composite video |
CN115880749A (en) * | 2022-11-08 | 2023-03-31 | 杭州中科睿鉴科技有限公司 | Face deep false detection method based on multi-mode feature fusion |
CN115909445A (en) * | 2022-11-11 | 2023-04-04 | 中国人民解放军国防科技大学 | Face image counterfeiting detection method and related equipment |
CN115601820A (en) * | 2022-12-01 | 2023-01-13 | 思腾合力(天津)科技有限公司(Cn) | Face fake image detection method, device, terminal and storage medium |
CN116434351A (en) * | 2023-04-23 | 2023-07-14 | 厦门大学 | Fake face detection method, medium and equipment based on frequency attention feature fusion |
Non-Patent Citations (1)
Title |
---|
曹申豪等: ""人脸伪造及检测技术综述"", 《中国图像图形学报》, vol. 27, no. 4, 30 April 2022 (2022-04-30), pages 1023 - 1038 * |
Also Published As
Publication number | Publication date |
---|---|
CN117275068B (en) | 2024-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rao et al. | Deep learning local descriptor for image splicing detection and localization | |
Guo et al. | Fake face detection via adaptive manipulation traces extraction network | |
Rao et al. | Multi-semantic CRF-based attention model for image forgery detection and localization | |
Yu et al. | A multi-purpose image counter-anti-forensic method using convolutional neural networks | |
De La Croix et al. | Toward secret data location via fuzzy logic and convolutional neural network | |
Wei et al. | Controlling neural learning network with multiple scales for image splicing forgery detection | |
Yang et al. | Design of cyber-physical-social systems with forensic-awareness based on deep learning | |
Wei et al. | Universal deep network for steganalysis of color image based on channel representation | |
Liu et al. | Image deblocking detection based on a convolutional neural network | |
Liang et al. | Image resampling detection based on convolutional neural network | |
Chen et al. | Image splicing localization using residual image and residual-based fully convolutional network | |
CN116958637A (en) | Training method, device, equipment and storage medium of image detection model | |
CN114677372A (en) | Depth forged image detection method and system integrating noise perception | |
Kumar et al. | A hybrid method for the removal of RVIN using self organizing migration with adaptive dual threshold median filter | |
Tripathi et al. | Image splicing detection system using intensity-level multi-fractal dimension feature engineering and twin support vector machine based classifier | |
Meena et al. | Image splicing forgery detection techniques: A review | |
Sonam et al. | Secure digital image watermarking using memristor-based hyperchaotic circuit | |
Zarrabi et al. | BlessMark: a blind diagnostically-lossless watermarking framework for medical applications based on deep neural networks | |
CN117275068B (en) | Method and system for training human face fake detection in test stage containing uncertainty guidance | |
Singh et al. | StegGAN: hiding image within image using conditional generative adversarial networks | |
Li | Saliency prediction based on multi-channel models of visual processing | |
CN116522326A (en) | Data enhancement model and method suitable for power grid information attack detection | |
Mazumdar et al. | Siamese convolutional neural network‐based approach towards universal image forensics | |
CN115358952A (en) | Image enhancement method, system, equipment and storage medium based on meta-learning | |
Xu et al. | Steganography algorithms recognition based on match image and deep features verification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |