CN117275068A - Method and system for training human face fake detection in test stage containing uncertainty guidance - Google Patents

Method and system for training human face fake detection in test stage containing uncertainty guidance Download PDF

Info

Publication number
CN117275068A
CN117275068A CN202311224982.8A CN202311224982A CN117275068A CN 117275068 A CN117275068 A CN 117275068A CN 202311224982 A CN202311224982 A CN 202311224982A CN 117275068 A CN117275068 A CN 117275068A
Authority
CN
China
Prior art keywords
features
image
attention
frequency domain
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311224982.8A
Other languages
Chinese (zh)
Other versions
CN117275068B (en
Inventor
罗引
徐楠
郝艳妮
陈博
李军锋
曹家
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Wenge Technology Co ltd
Original Assignee
Beijing Zhongke Wenge Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Wenge Technology Co ltd filed Critical Beijing Zhongke Wenge Technology Co ltd
Priority to CN202311224982.8A priority Critical patent/CN117275068B/en
Publication of CN117275068A publication Critical patent/CN117275068A/en
Application granted granted Critical
Publication of CN117275068B publication Critical patent/CN117275068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a training face counterfeiting detection method and system in a test stage containing uncertainty guidance, belonging to the technical field of deep learning and computer vision, wherein the method comprises the following steps: acquiring an image to be distinguished as an initial input image; acquiring a high-frequency information image of the initial input image; extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features; performing cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features; based on the fusion characteristics, a fusion mode is selected in a self-adaptive mode according to different input images and task requirements, so that discrimination characteristics are obtained, and classification tasks are performed based on the discrimination characteristics. The invention fully utilizes the effective information in the frequency domain and the RGB domain to mine the fake trace, optimizes the uncertainty in the network by utilizing the uncertainty-guided test stage training strategy, and improves the generalization performance.

Description

Method and system for training human face fake detection in test stage containing uncertainty guidance
Technical Field
The invention belongs to the technical field of deep learning and computer vision, and particularly relates to a face fake detection method and system for training in a test stage containing uncertainty guidance.
Background
Face-forgery detection is a key technology for detecting face forgery by various means. With the popularization and practicality of face recognition technology, the situation of face counterfeiting is also increasing, which threatens the benefits of personal privacy, social security, legal fairness and the like, including the application of face recognition technology, financial transaction security, medical care protection and the like. In the application of face recognition technology, if face forgery exists, false security records and access authorization can be caused; in financial transactions, if the phenomenon of face counterfeiting exists, account security is threatened, and funds loss can be caused; in healthcare protection, face falsification may cause leakage and alteration of medical record information, and may also cause problems such as infringement of medical privacy of a patient. With concerns about these negative effects, researchers have begun to explore various means to address the problem of face counterfeiting. A number of face-counterfeit detectors have been invented, for example, using texture features for detection, using face motion features, using deep learning techniques, etc. The technologies play a positive role in face counterfeiting detection and other related application fields, are hopeful to help human beings to better protect personal privacy, social security, legal fairness and other benefits, and enable life of people to be more convenient, convenient and safe.
Face-forgery detection is seen in early work as a binary classification problem, and researchers aim to learn decision boundaries between genuine and fake faces. However, with the continuous development of counterfeiting technology, such methods gradually lose their performance. Therefore, many recent works have been shifted to finding false clues from the frequency domain, and discriminating the authenticity of a face based on the fine false clues. Some researchers have proposed a similarity model using frequency characteristics to improve the performance of the model in the invisible domain, others have assumed that high frequency noise of the image can remove color textures and mine forgery marks, and use image noise to improve generalization ability. However, there are still non-negligible problems: the effect of frequency is not always sufficiently efficient and adaptable to different counterfeiting techniques; a network trained on a common data set cannot effectively quantify its uncertainty.
Disclosure of Invention
The invention aims to provide a training face fake detection method in a test stage containing uncertainty guidance, which can further excavate fake marks in a frequency domain, can fuse frequency domains and RGB information with different qualities, optimizes uncertainty in a network and further improves generalization performance of the face fake detection method.
In order to achieve the above object, the present invention provides a face falsification detection method for training in a test phase including uncertainty guidance, comprising the steps of:
step S100, an image to be distinguished is obtained as an initial input image;
step S200, acquiring a high-frequency information image of the initial input image, wherein the high-frequency information image is image information positioned in a high-frequency band in the initial input image;
step S300, extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features;
step S400, performing cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, where the cross attention calculation is also called cross-attention, and refers to performing attention calculation on a certain position in one feature sequence and all positions in another feature sequence in an attention mechanism;
step S500, based on the fusion characteristics, adaptively selecting a fusion mode according to different input images and task requirements to obtain discrimination characteristics, and classifying tasks based on the discrimination characteristics.
Further, in the step S200, the initial input image is converted from the spatial domain to the frequency domain by using discrete cosine transform, and the high-frequency information image is screened out.
Further, in the step S300, RGB features and frequency domain attention features are extracted based on a self-attention mechanism and global information of an input sequence in the high frequency information image.
Further, in the step S400, a dual attention mechanism based on channel attention and spatial attention is used to interact information between the fused RGB features and the frequency domain features.
Further, in the step S500, the fusion manner includes a dynamic weighted average fusion manner with uncertainty factors, and the weights of the feature graphs in the fusion features are adaptively adjusted by the dynamic weighted average fusion manner.
The invention also provides a training face counterfeiting detection system containing uncertainty guidance in the test stage, which comprises the following steps:
an image acquisition unit for acquiring an image to be discriminated as an initial input image;
a high-frequency conversion unit, configured to obtain a high-frequency information image of the initial input image, where the high-frequency information image is image information located in a high-frequency band in the initial input image;
the feature extraction and fusion unit is used for extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features;
the feature calculation unit is used for carrying out cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, wherein the cross attention calculation is also called cross-attention, and refers to that in an attention mechanism, attention calculation is carried out on a certain position in one feature sequence and all positions in the other feature sequence;
and the feature judging unit is used for adaptively selecting a fusion mode based on the fusion features according to different input images and task requirements to obtain judging features and classifying tasks based on the judging features.
Further, the high-frequency conversion unit comprises a plurality of converter modules, the plurality of converter modules are divided into three groups, and the three groups of converter modules are sequentially connected in series from high to low according to the corresponding spatial resolution.
Further, the feature extraction fusion unit further comprises an image feature enhancement processing module, wherein the image feature enhancement processing module is used for extracting a frequency domain image attention map from the frequency domain initial input image through a convolution encoder connected in series, then combining the frequency domain image attention map with the RGB initial input image, sending the frequency domain image attention map to a multi-stage feature conversion extractor, and repeating the operation after each feature conversion extractor to obtain frequency domain image features and frequency domain enhanced RGB image features.
The present invention also provides an electronic device including: one or more processors; a storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement the methods described above.
The present invention also provides a computer readable medium having stored thereon a computer program which when executed by a processor implements the method described above.
Compared with the prior art, according to the training face counterfeiting detection method and system for the testing stage containing uncertainty guidance, the RGB features and the frequency domain attention features are fused, the cross attention calculation is carried out on the fused RGB features and the frequency domain features to obtain the fusion features, the frequency domain information and the RGB information with different qualities can be fused, the fusion mode is adaptively selected based on the fusion features according to different input images and task requirements, the discrimination features are obtained, and the classification task is carried out based on the discrimination features. The method can further excavate fake trace in the frequency domain, optimize uncertainty in the network by utilizing an uncertainty-guided test stage training strategy, and further improve generalization performance of the method.
Drawings
FIG. 1 is a flow chart of a training face falsification detection method for a test phase with uncertainty guidance in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a feature extraction fusion unit according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a frequency domain feature enhancement network based on a converter module in an embodiment of the invention;
FIG. 4 is a schematic diagram of a frequency domain attention computation step in an embodiment of the present invention;
FIG. 5 is a schematic diagram of the cross domain attention calculation step in an embodiment of the present invention;
FIG. 6 is a schematic diagram of a dynamic fusion module in an embodiment of the invention;
FIG. 7 is a schematic diagram of a strategy for test phase training with uncertainty guidance in one embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.
It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.
It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a flowchart of a method for detecting face falsification during test phase training with uncertainty guidance, and fig. 7 is a schematic diagram of a strategy for training during test phase with uncertainty guidance according to an embodiment of the present invention, and a method for detecting face falsification during test phase training with uncertainty guidance according to a preferred embodiment of the present invention includes the following steps:
step S100, an image to be distinguished is obtained as an initial input image;
step S200, obtaining a high-frequency information image of an initial input image, wherein the high-frequency information image is image information positioned in a high-frequency band in the initial input image, converting the initial input image from a space domain to a frequency domain by adopting Discrete Cosine Transform (DCT), and screening out the image information of the high-frequency band as the high-frequency information image;
step S300, extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, as shown in FIG. 4, which is a schematic diagram of a frequency domain attention calculation step in an embodiment of the invention, by fusing the RGB features and the frequency domain attention features, obtaining fused RGB features and frequency domain features;
step S400, performing cross attention calculation on the RGB features and the frequency domain features after fusion to obtain fusion features, wherein the cross attention calculation is also called cross-attention, namely, in an attention mechanism, performing attention calculation on a certain position in one feature sequence and all positions in the other feature sequence, and obtaining more accurate fusion features through the cross attention calculation;
step S500, based on the fusion characteristics, adaptively selecting a fusion mode according to different input images and task requirements to obtain discrimination characteristics, and classifying tasks based on the discrimination characteristics.
In one embodiment of the present invention, in step S200, the initial input image is transformed from the spatial domain to the frequency domain using the discrete cosine transform, and the high frequency information image is screened out.
In an embodiment of the present invention, in step S300, RGB features and frequency domain attention features are extracted based on the self-attention mechanism and global information of the input sequence in the high frequency information image.
In an embodiment of the present invention, in step S400, a dual attention mechanism based on channel attention and spatial attention is used to interact information between the fused RGB features and the frequency domain features, so as to further improve the expression capability of the fused features.
In an embodiment of the present invention, in step S500, the fusion method includes a dynamic weighted average fusion method with uncertainty factors, and the weights of the feature graphs in the fusion features are adaptively adjusted by the dynamic weighted average fusion method.
The invention also provides a training face counterfeiting detection system containing uncertainty guidance in the test stage, which comprises the following steps:
an image acquisition unit for acquiring an image to be discriminated as an initial input image;
the high-frequency conversion unit is used for acquiring a high-frequency information image of the initial input image, wherein the high-frequency information image is image information positioned in a high-frequency band in the initial input image;
the feature extraction fusion unit, as shown in fig. 2, is a schematic structural diagram of the feature extraction fusion unit in an embodiment of the present invention, where the feature extraction fusion unit is configured to extract RGB features and frequency domain attention features of different scales in a high-frequency information image, and fuse the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features; the feature extraction fusion unit comprises a frequency domain attention module, wherein the frequency domain attention module adopts a window-based self-attention calculation module, and can realize information interaction between frequency domain features and utilize the information interaction;
the feature calculation unit is used for carrying out cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, wherein the cross attention calculation is also called cross-attention, and refers to that in an attention mechanism, attention calculation is carried out on a certain position in one feature sequence and all positions in the other feature sequence;
the feature discriminating unit is used for adaptively selecting a fusion mode based on the fusion features according to different input images and task requirements to obtain discriminating features and classifying tasks based on the discriminating features. The feature discriminating unit comprises a dynamic fusion module, the dynamic fusion module carries out dynamic weighted average in a mode with uncertainty factors, and the weight of each feature map in fusion features is adaptively adjusted, so that the image discriminating capability is further improved, the uncertainty factors are arranged in the dynamic fusion module, loss calculation is carried out on the output discriminating features, fine adjustment is carried out on the feature discriminating unit, and only the parameters of the dynamic fusion module are updated in an unsupervised training mode.
In one embodiment of the present invention, as shown in fig. 6, the dynamic fusion module further uses two linear layers in order to obtain the corresponding weight of each branchAnd->The global averaging pooling layer GAP and the activation layer Gelu function δ integrate the features of the three branches, which can be expressed as:
wherein the method comprises the steps of,/>. Then we set up three linear layers/>,/>And->And a softmax function generating a quality weight for each branch>Can be expressed as:
wherein the method comprises the steps ofRepresenting the quality of each branch. Because the contributions of different branches to mining spurious cues are different, the fusion features are weighted according to quality and two linear layers are used to recover the channel dimensions of the dynamic fusion features. Output->Can be expressed as:
in one embodiment of the present invention, the high frequency conversion unit includes a plurality of converter modules, the plurality of converter modules are divided into three groups, and the three groups of converter modules are serially connected in sequence from high to low according to the corresponding spatial resolution. The three groups of converter modules respectively and correspondingly process the image features with different spatial resolutions, and combine the RGB features with the frequency domain attention features to obtain updated RGB features. The combination process of the RGB features and the frequency domain attention features utilizes the frequency domain attention module to model the interdependence relationship between RGB and high-frequency information images, so that the model can extract the image features more accurately.
In one embodiment of the invention, the input image is passed through the high frequency conversion unit to obtain the high frequency image input in accordance with the Discrete Cosine Transform (DCT) in which the low frequency band is the first 1/16 of the spectrum, the middle frequency band is between 1/16 and 1/8 of the spectrum, and the high frequency band is the last 7/8 of the spectrum, in order to enhance high frequency fine artifacts, the low and middle frequency information is filtered by setting them to 0.
In an embodiment of the present invention, the feature extraction and fusion unit further includes an image feature enhancement processing module, as shown in fig. 3, where the image feature enhancement processing module is configured to extract a frequency domain image attention map from a frequency domain initial input image through a convolutional encoder connected in series, then combine the frequency domain image attention map with an RGB initial input image, send the frequency domain image attention map to a multi-stage feature conversion extractor, and repeat the operation after each feature conversion extractor to obtain a frequency domain image feature and a frequency domain enhanced RGB image feature. Wherein the convolutional encoder comprises a convolutional layer and a nonlinear activation layer, the convolutional encoder is configured to map a frequency-domain image input to a high-dimensional feature domain, extract shallow-layer frequency-domain feature representations through sets of basis modules consisting of the convolutional layer and the nonlinear activation layer, and generate a frequency-domain image attention-map, which may be obtained by inputting frequency-domain image features into a set of convolutional decoders consisting of the convolutional layer and the nonlinear activation layer.
In an embodiment of the present invention, a method for processing an input shallow image feature by an image feature enhancement processing module includes: image RGBAnd frequency domain image->As input to the transducer network, the RGB features are extracted via a transducer block>And frequency domain features->Frequency feature acquisition using frequency domain attention moduleIn an effort to direct the forgery trace of RGB modalities from a frequency perspective. The frequency domain attention module is shown in fig. 4, and the calculation process is as follows:
wherein the method comprises the steps ofRepresenting the frequency characteristics after feature extraction, σ represents Sigmoid function, and GAP and GMP represent global average pooling and global maximum pooling, respectively. CAT denotes connecting features in the depth direction. We finally select a 7 x 7 convolution kernel to extract the counterfeit trace in the frequency domain because it detects edge information better and covers a larger area than three 3 x 3 convolution kernels. Attention seeking to->Subtle forgery marks in the frequency domain that are difficult to dig out in RGB features are contained. Therefore, we will->Application to RGB features->In order to further excavate the fake trace, the calculation process is as follows:
wherein->Representing summation(s)>Representing element-wise multiplication. In addition, the feature extraction process has three stages, low, medium and high. The low-level features represent texture counterfeit information, while the high-level features extract more of the overall counterfeit trace. Thus, interacting with RGB features and frequency features at multiple levels to obtain a more comprehensive representation of counterfeit features. Specifically, the firstiFrequency domain output of individual phases->Is used as +.>First, thei+1Input of stage->And RGB input->RGB features, which are pre-guided in the frequency domain, can be expressed as:
the final stage is then output with featuresAnd->Input into a dynamic fusion module to mine more discriminative information, whereinhwAndcis the dimension of the output feature.
In an embodiment of the present invention, in the multi-stage feature transformation extractor, the feature transformation extractor of each stage shares the network weight, the output of the feature extractor of the previous stage is used as the input of the feature extractor of the current stage, the image features loop through the multi-stage feature transformation extractor a plurality of times, and the feature transformation extractor of each stage is configured to pass the input RGB and frequency domain image features through the window self-attention calculation unit and the corresponding forward propagation calculation unit, so as to realize the information interaction between the image features of different positions.
In an embodiment of the present invention, the feature extraction fusion unit further includes a convolution decoder, where the convolution decoder is configured to input the extracted deep image feature expression into a series of superimposed basic convolution layers and upsampling layers, and perform pixel-level addition operation on the shallow image feature and the deep image feature that are connected in a skip manner, so as to obtain an RGB image feature after frequency domain enhancement, and further obtain an enhanced RGB image.
In some preferred embodiments, the multi-stage feature transformation extractor configuration further comprises a time-interleaved attention computation module, the feature transformation extractor of each stage combining the feature extraction processes of the self-attention mechanism computation unit and the forward propagation computation unit, and employing a cross-layer connection to pixel-level additive connect the input and output of the feature extractor. The time cross-attention calculation module is shown in fig. 5, and in particular, the frequency modality should be an auxiliary component. Given RGB featuresAnd frequency characteristics->We use the query-key-value mechanism to fuse them initially, fusing them into a unified representation, which can be expressed as:
wherein,QKVthree different matrix transforms are shown for projective transformation of the input features.hwAndcthe dimension of the output feature represents the dimension of the feature.
The embodiment of the invention adds a dynamic fusion module of uncertainty factors, wherein the dynamic fusion module adds uncertainty factors such as Gumbel-Softmax function applied with random disturbance to generate some disturbance to the relative quality of a judging image, and dynamically selects a prediction result as a model to be output, thereby finely adjusting a network and increasing the fitting capability of the network to unknown data.
The invention also provides an electronic device, characterized by comprising: one or more processors; a storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the methods described above.
The present invention also provides a computer-readable medium having stored thereon a computer program characterized in that: the program is executed by a processor to implement the method described above.
According to the face fake detection method and system for training in the test stage with uncertainty guidance, the RGB features and the frequency domain attention features are fused, the fused RGB features and the frequency domain features are subjected to cross attention calculation to obtain the fusion features, frequency domains with different qualities and RGB information can be fused, fusion modes are adaptively selected based on the fusion features according to different input images and task requirements, discrimination features are obtained, and classification tasks are performed based on the discrimination features. The method can further excavate fake trace in the frequency domain, optimize uncertainty in the network by utilizing an uncertainty-guided test stage training strategy, and further improve generalization performance of the method.
It can be understood that the training face counterfeit detection system based on the uncertainty guidance provided in the foregoing embodiment is only exemplified by the division of the foregoing functional modules, in practical application, the foregoing functional allocation may be completed by different functional modules according to needs, that is, the modules or steps in the foregoing embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into a plurality of sub-modules to complete all or part of the functions described above.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (10)

1. The method for detecting the face forgery by training in the test stage containing uncertainty guidance is characterized by comprising the following steps of:
step S100, an image to be distinguished is obtained as an initial input image;
step S200, acquiring a high-frequency information image of the initial input image, wherein the high-frequency information image is image information positioned in a high-frequency band in the initial input image;
step S300, extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features;
step S400, performing cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, where the cross attention calculation is also called cross-attention, and refers to performing attention calculation on a certain position in one feature sequence and all positions in another feature sequence in an attention mechanism;
step S500, based on the fusion characteristics, adaptively selecting a fusion mode according to different input images and task requirements to obtain discrimination characteristics, and classifying tasks based on the discrimination characteristics.
2. The method according to claim 1, wherein in step S200, the initial input image is converted from a space domain to a frequency domain by discrete cosine transform, and the high-frequency information image is screened out.
3. The uncertainty-guided test phase-training face-forgery detection method according to claim 1, wherein in the step S300, RGB features and frequency-domain attention features are extracted based on a self-attention mechanism and global information of an input sequence in the high-frequency information image.
4. The method according to claim 1, wherein in step S400, a dual attention mechanism based on channel attention and spatial attention is used to interact information between the fused RGB features and the frequency domain features.
5. The method according to claim 1, wherein in step S500, the fusion method includes a dynamic weighted average fusion method with uncertainty factors, and the weights of the feature maps in the fusion features are adaptively adjusted by the dynamic weighted average fusion method.
6. A training face-forgery detection system for a test phase with uncertainty guidance, comprising:
an image acquisition unit for acquiring an image to be discriminated as an initial input image;
a high-frequency conversion unit, configured to obtain a high-frequency information image of the initial input image, where the high-frequency information image is image information located in a high-frequency band in the initial input image;
the feature extraction and fusion unit is used for extracting RGB features and frequency domain attention features of different scales in the high-frequency information image, and fusing the RGB features and the frequency domain attention features to obtain fused RGB features and frequency domain features;
the feature calculation unit is used for carrying out cross attention calculation on the fused RGB features and the frequency domain features to obtain fused features, wherein the cross attention calculation is also called cross-attention, and refers to that in an attention mechanism, attention calculation is carried out on a certain position in one feature sequence and all positions in the other feature sequence;
and the feature judging unit is used for adaptively selecting a fusion mode based on the fusion features according to different input images and task requirements to obtain judging features and classifying tasks based on the judging features.
7. The uncertainty-guided testing phase-training face-forgery detection system of claim 6, wherein the high-frequency conversion unit includes a number of converter modules that are divided into three groups that are serially connected in sequence from high to low according to the corresponding spatial resolution.
8. The system according to claim 6, wherein the feature extraction and fusion unit further comprises an image feature enhancement processing module, the image feature enhancement processing module is configured to extract a frequency domain image attention map from a frequency domain initial input image through a convolutional encoder connected in series, combine the frequency domain image attention map with an RGB initial input image, send the frequency domain image attention map to a multi-stage feature conversion extractor, and obtain frequency domain image features and frequency domain enhanced RGB image features after the processing by the feature conversion extractor.
9. An electronic device, comprising: one or more processors; a storage means for storing one or more programs; when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.
10. A computer readable medium having a computer program stored thereon, characterized by: the program, when executed by a processor, implements the method of any one of claims 1-5.
CN202311224982.8A 2023-09-21 2023-09-21 Method and system for training human face fake detection in test stage containing uncertainty guidance Active CN117275068B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311224982.8A CN117275068B (en) 2023-09-21 2023-09-21 Method and system for training human face fake detection in test stage containing uncertainty guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311224982.8A CN117275068B (en) 2023-09-21 2023-09-21 Method and system for training human face fake detection in test stage containing uncertainty guidance

Publications (2)

Publication Number Publication Date
CN117275068A true CN117275068A (en) 2023-12-22
CN117275068B CN117275068B (en) 2024-05-17

Family

ID=89202101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311224982.8A Active CN117275068B (en) 2023-09-21 2023-09-21 Method and system for training human face fake detection in test stage containing uncertainty guidance

Country Status (1)

Country Link
CN (1) CN117275068B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200402223A1 (en) * 2019-06-24 2020-12-24 Insurance Services Office, Inc. Machine Learning Systems and Methods for Improved Localization of Image Forgery
CN113536990A (en) * 2021-06-29 2021-10-22 复旦大学 Deep fake face data identification method
CN114495245A (en) * 2022-04-08 2022-05-13 北京中科闻歌科技股份有限公司 Face counterfeit image identification method, device, equipment and medium
CN114898432A (en) * 2022-05-17 2022-08-12 中南大学 Fake face video detection method and system based on multi-feature fusion
CN115147895A (en) * 2022-06-16 2022-10-04 北京百度网讯科技有限公司 Face counterfeit discrimination method and device and computer program product
CN115393760A (en) * 2022-08-16 2022-11-25 公安部物证鉴定中心 Method, system and equipment for detecting Deepfake composite video
CN115601820A (en) * 2022-12-01 2023-01-13 思腾合力(天津)科技有限公司(Cn) Face fake image detection method, device, terminal and storage medium
US20230081645A1 (en) * 2021-01-28 2023-03-16 Tencent Technology (Shenzhen) Company Limited Detecting forged facial images using frequency domain information and local correlation
CN115880749A (en) * 2022-11-08 2023-03-31 杭州中科睿鉴科技有限公司 Face deep false detection method based on multi-mode feature fusion
CN115909445A (en) * 2022-11-11 2023-04-04 中国人民解放军国防科技大学 Face image counterfeiting detection method and related equipment
CN116434351A (en) * 2023-04-23 2023-07-14 厦门大学 Fake face detection method, medium and equipment based on frequency attention feature fusion

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200402223A1 (en) * 2019-06-24 2020-12-24 Insurance Services Office, Inc. Machine Learning Systems and Methods for Improved Localization of Image Forgery
US20230081645A1 (en) * 2021-01-28 2023-03-16 Tencent Technology (Shenzhen) Company Limited Detecting forged facial images using frequency domain information and local correlation
CN113536990A (en) * 2021-06-29 2021-10-22 复旦大学 Deep fake face data identification method
CN114495245A (en) * 2022-04-08 2022-05-13 北京中科闻歌科技股份有限公司 Face counterfeit image identification method, device, equipment and medium
CN114898432A (en) * 2022-05-17 2022-08-12 中南大学 Fake face video detection method and system based on multi-feature fusion
CN115147895A (en) * 2022-06-16 2022-10-04 北京百度网讯科技有限公司 Face counterfeit discrimination method and device and computer program product
CN115393760A (en) * 2022-08-16 2022-11-25 公安部物证鉴定中心 Method, system and equipment for detecting Deepfake composite video
CN115880749A (en) * 2022-11-08 2023-03-31 杭州中科睿鉴科技有限公司 Face deep false detection method based on multi-mode feature fusion
CN115909445A (en) * 2022-11-11 2023-04-04 中国人民解放军国防科技大学 Face image counterfeiting detection method and related equipment
CN115601820A (en) * 2022-12-01 2023-01-13 思腾合力(天津)科技有限公司(Cn) Face fake image detection method, device, terminal and storage medium
CN116434351A (en) * 2023-04-23 2023-07-14 厦门大学 Fake face detection method, medium and equipment based on frequency attention feature fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹申豪等: ""人脸伪造及检测技术综述"", 《中国图像图形学报》, vol. 27, no. 4, 30 April 2022 (2022-04-30), pages 1023 - 1038 *

Also Published As

Publication number Publication date
CN117275068B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
Rao et al. Deep learning local descriptor for image splicing detection and localization
Guo et al. Fake face detection via adaptive manipulation traces extraction network
Rao et al. Multi-semantic CRF-based attention model for image forgery detection and localization
Yu et al. A multi-purpose image counter-anti-forensic method using convolutional neural networks
De La Croix et al. Toward secret data location via fuzzy logic and convolutional neural network
Wei et al. Controlling neural learning network with multiple scales for image splicing forgery detection
Yang et al. Design of cyber-physical-social systems with forensic-awareness based on deep learning
Wei et al. Universal deep network for steganalysis of color image based on channel representation
Liu et al. Image deblocking detection based on a convolutional neural network
Liang et al. Image resampling detection based on convolutional neural network
Chen et al. Image splicing localization using residual image and residual-based fully convolutional network
CN116958637A (en) Training method, device, equipment and storage medium of image detection model
CN114677372A (en) Depth forged image detection method and system integrating noise perception
Kumar et al. A hybrid method for the removal of RVIN using self organizing migration with adaptive dual threshold median filter
Tripathi et al. Image splicing detection system using intensity-level multi-fractal dimension feature engineering and twin support vector machine based classifier
Meena et al. Image splicing forgery detection techniques: A review
Sonam et al. Secure digital image watermarking using memristor-based hyperchaotic circuit
Zarrabi et al. BlessMark: a blind diagnostically-lossless watermarking framework for medical applications based on deep neural networks
CN117275068B (en) Method and system for training human face fake detection in test stage containing uncertainty guidance
Singh et al. StegGAN: hiding image within image using conditional generative adversarial networks
Li Saliency prediction based on multi-channel models of visual processing
CN116522326A (en) Data enhancement model and method suitable for power grid information attack detection
Mazumdar et al. Siamese convolutional neural network‐based approach towards universal image forensics
CN115358952A (en) Image enhancement method, system, equipment and storage medium based on meta-learning
Xu et al. Steganography algorithms recognition based on match image and deep features verification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant