CN116012958A - Method, system, device, processor and computer readable storage medium for implementing deep fake face identification - Google Patents
Method, system, device, processor and computer readable storage medium for implementing deep fake face identification Download PDFInfo
- Publication number
- CN116012958A CN116012958A CN202310093773.8A CN202310093773A CN116012958A CN 116012958 A CN116012958 A CN 116012958A CN 202310093773 A CN202310093773 A CN 202310093773A CN 116012958 A CN116012958 A CN 116012958A
- Authority
- CN
- China
- Prior art keywords
- space
- mask
- rppg
- time
- time diagram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
The invention relates to a method for realizing deep fake face identification based on rPPG multi-scale space-time diagram and a two-stage model, wherein the method comprises the following steps: (1) Collecting a depth fake face video data set, and preprocessing videos; (2) generating an rPPG multi-scale space-time diagram; (3) Constructing a mask guided local attention module, performing first-stage training, and extracting the characteristics of a single rPPG time space diagram; (4) Constructing a time domain aggregation module based on a transducer, performing second-stage training, and fusing comprehensive features of a plurality of adjacent time-space diagrams; (6) The construction classification head performs classification recognition processing and constructs a loss function. The invention also relates to a corresponding system, device, processor and storage medium thereof. By adopting the method, the system, the device, the processor and the storage medium thereof, the comprehensive characteristics of a plurality of time-space diagrams representing one video are extracted through the two-stage model, and compared with a baseline model, the method, the system, the device, the processor and the storage medium thereof have better interpretability and the fake face identification effect.
Description
Technical Field
The invention relates to the technical field of digital images, in particular to the technical field of computer vision, and specifically relates to a method, a system, a device, a processor and a computer readable storage medium for realizing deep fake face identification based on rPPG multi-scale space-time diagrams and a two-stage model.
Background
With the development of the generated depth model, the technical threshold of the depth face counterfeiting is lower, and people can easily create vivid face counterfeiting content through the disclosed model or tool. Deep forgery may also be misused by malicious users, creating false political information or propagating pornography. As a defense mechanism, face counterfeit authentication techniques have been developed and used to mitigate the risks associated with deep counterfeiting. Remote photoplethysmography (rpg) extracts the heart beat signal from the recorded video by examining subtle changes in skin color caused by heart activity. Because the face counterfeiting process inevitably destroys the periodic variation of facial color, rpg has proven to be a biological indicator that can be used to effectively identify counterfeit faces.
However, most existing rpg signal depth-based face-forgery identification methods still have some drawbacks. Such as: the application number is: the invention patent application of CN202210572034.2 takes 32 square small frames on each frame of human face to extract heart rate signals, but the ROI areas are overlapped with each other and have single scale; and only a one-stage encoder is used for extracting the characteristics of a single rPPG space-time diagram, and the characteristic fusion of a plurality of adjacent rPPG space-time diagrams is not considered; and only two kinds of cross entropy loss are used, and the attention weight of the local position of the pixel level is not considered, so that the detection performance is limited.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method, a system, a device, a processor and a computer readable storage medium thereof for realizing deep fake face identification based on rPPG multi-scale space-time diagram, which can effectively consider the comprehensive characteristics of a plurality of adjacent video clips.
To achieve the above object, the method, system, device, processor and computer readable storage medium thereof for implementing deep fake face authentication based on rpg multiscale space-time diagram and two-stage model of the present invention are as follows:
the method for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model is mainly characterized by comprising the following steps of:
(1) Collecting a depth fake face video data set, and preprocessing video data to obtain a cut face video frame set;
(2) Generating an rPPG multi-scale space-time diagram according to the face video frame obtained after cutting;
(3) Constructing a mask guided local attention module, performing first-stage training, and extracting the characteristics of a single rPPG time space diagram;
(4) Constructing a transducer module, performing second-stage training, and fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams;
(5) And constructing a classification head, pooling the fused high-dimensional features, classifying and identifying the fused high-dimensional features to obtain the identification result of the target image and constructing an overall loss function.
Preferably, the step (2) specifically includes the following steps:
(2.1) dividing a complete video into a plurality of T-frame video segments in step omega frames;
(2.2) for each frame, carrying out face alignment and extracting face key points;
(2.3) selecting n heartbeat signal information areas according to the face key points to form an ROI set R t ={R 1t ,R 2t ,…,R nt };
(2.4) for the ROI set R t All non-empty subsets in the list are calculated, and the average value of all pixels contained in each non-empty subset is obtained to obtain 2 n -1 pixel mean of RGB three channels;
(2.5) for each video clip, the T frames contained therein are subjected to the operations of steps (2.2) - (2.4), resulting in a dimension of T× (2) n -1) x 3 multiscale space-time diagram, wherein T is the length of time, 2 n -1 is the number of combinations of different information areas and 3 is the number of RGB channels.
Preferably, the n information areas in (2.3) are forehead, chin, upper left and right cheeks, lower left and right cheeks, respectively, and the specific areas are shown in fig. 2.
Preferably, the step (3) specifically includes the following steps:
(3.1) constructing EfficientNet as backbone network f (& gt) for input rPPG time space diagramExtracting features through backbone network, and obtaining middle layer feature map F m =f mid (X)∈R C×H×W Wherein C, H, W represent the channel number, column number and line number of the feature map respectively;
(3.2) building a mask-guided local attention module to middle layer feature map F m For input, an attention mask A is obtained mask :
A mask =Sigmoid(Conv(F m ))
Wherein Conv (·) represents a convolution operation;
(3.3) masking the attention with the middle layer feature map F m Performing point multiplication to obtain a position weighted feature map F' =A mask ·F m Taking F' as input to extract the characteristics of the subsequent network layer;
(3.4) calculating the pixel level mask tag A of the rPPG time space map gt : for rPPG space-time diagram generated by false video, finding out its corresponding real rPPG space-time diagram, making difference pixel by pixel to obtain residual space-time diagram, graying residual space-time diagram, normalizing 0 to 1, and regulating its size to be identical to attention mask A mask Binarizing the same size with 0.1 as a threshold value to obtain a corresponding pixel-level mask label A gt ;
(3.5) masking the attention A mask And corresponding pixel level mask tag A gt The L1 distance is calculated as a mask loss function L according to the following formula mask :
More preferably, the step (4) specifically includes the following steps:
(4.1) respectively inputting K adjacent rPPG time space diagrams into the backbone network trained in the first stage to obtain K global high-dimensional characteristics F h Then global average pooling is carried out, and classification codes and one-dimensional leachable position codes are overlapped to be used as an input sequence Z of a transducer in ;
(4.2) constructing a feature fusion module of a plurality of rPPG time space diagrams based on a transducer: will input sequence Z in Performing multi-head self-attention operation MSA, passing through a feed forward network FFN, and after each operation is performed, further adjusting output by using layer normalization LN and residual connection to obtain output result Z of transducer out 。
More preferably, the step (4.2) specifically includes the following steps:
(4.2.1) input sequence Z in Generating a query matrix through a linear mapping layerKey matrix->And a Value matrix +.>The three matrices are then transferred into a multi-headed self-attention mechanism MSA as shown in the following equation:
wherein d is a normalization constant, and T is matrix transposition operation;
(4.2.2) obtaining a feature fusion output Z after the conversion process through the FFN process of a feedforward network layer consisting of a multi-layer perceptron out 。
More preferably, the step (5) specifically includes the following steps:
(5.1) the fused comprehensive characteristics Z obtained from the second stage training output out Global average pooling g (·) is performed, and then a fully connected network FC is used to map dimensions to category number 2 to obtain vectorsAs shown in the following formula:
Z=FC(g(Z out ))
(5.2) calculating Softmax from Z to obtain a final prediction score y', and calculating a two-category cross entropy loss L from the label y ce As shown in the following formula:
L ce =y log y′+(1-y)log(1-y′)
(5.3) construction of the Overall loss function L all As shown in the following formula:
L all =L ce +λL mask
where λ is the hyper-parameter used to balance cross entropy loss and mask loss.
The system for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model by using the method is mainly characterized by comprising the following steps:
the rPPG multi-scale space-time diagram generation module is used for calculating an rPPG space-time diagram from the face video frame;
the mask-guided local attention module is connected with the rPPG multi-scale space-time diagram generation module and is used for enhancing the learning of local information and extracting the characteristics of a single rPPG space-time diagram;
the transducer module is connected with the local attention module guided by the mask and used for fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams; and
the classification head module is connected with the transducer module and is used for pooling the integrated features after fusion and carrying out classification recognition processing so as to obtain the identification result of the target image and construct an overall loss function.
The device for realizing the deep fake face identification based on the rPPG multi-scale space-time diagram and the two-stage model is mainly characterized by comprising the following components:
a processor configured to execute computer-executable instructions;
and a memory storing one or more computer-executable instructions which, when executed by the processor, implement the steps of the method for implementing deep counterfeited face authentication based on the rpg multiscale space-time diagram and the two-stage model described above.
The processor for realizing the deep fake face identification based on the rPPG multi-scale space-time diagram and the two-stage model is mainly characterized in that the processor is configured to execute computer executable instructions, and when the computer executable instructions are executed by the processor, the steps of the method for realizing the deep fake face identification based on the rPPG multi-scale space-time diagram and the two-stage model are realized.
The computer readable storage medium is mainly characterized in that the computer program is stored thereon, and the computer program can be executed by a processor to realize the steps of the method for realizing the identification of the deep fake human face based on the rPPG multi-scale time space diagram and the two-stage model.
The method, the system, the device, the processor and the computer readable storage medium thereof for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model are adopted, the multi-scale space-time diagram of the heart rate signal rPPG is innovatively taken as model input, and a classical CNN model (such as EfficientNet) and a Transformer are used as the two-stage model. In order to enhance the perception of the model on the local position information, the invention also innovatively introduces a mask-guided local attention module, and the model is guided to further distinguish different modes of the vacuum space-time diagram through the indication of the pixel-level space-time diagram mask label. The transducer module fuses the features of multiple neighboring rpg time-space diagrams through a self-attention mechanism. The technical scheme has the advantages that experimental verification is carried out on the faceforensis++ data set, and compared with a baseline model, the method has a more prominent classification and identification effect.
Drawings
Fig. 1 is a schematic diagram of a generation flow of a method for implementing deep fake face identification based on an rpg multi-scale space-time diagram and a two-stage model.
Fig. 2 is a schematic flow chart of generating a multi-scale rpg time-space diagram based on the rpg multi-scale time-space diagram and the two-stage model for realizing the deep fake face identification.
Fig. 3 is a schematic diagram of a frame structure of a system for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to the present invention.
FIG. 4 is a schematic diagram of a transducer module according to the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, a further description will be made below in connection with specific embodiments.
Before describing in detail embodiments that are in accordance with the present invention, it should be observed that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, the method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model includes the following steps:
the method for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model is mainly characterized by comprising the following steps of:
(1) Collecting a depth fake face video data set, and preprocessing video data to obtain a cut face video frame set;
(2) Generating an rPPG multi-scale space-time diagram according to the face video frame obtained after cutting;
(3) Constructing a mask guided local attention module, performing first-stage training, and extracting the characteristics of a single rPPG time space diagram;
(4) Constructing a transducer module, performing second-stage training, and fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams;
(5) And constructing a classification head, pooling the fused high-dimensional features, classifying and identifying the fused high-dimensional features to obtain the identification result of the target image and constructing an overall loss function.
In practical application, the step (1) specifically includes:
downloading a faceforensic++ data set from a data set officer network to obtain an original video, extracting an image from the original video, and obtaining a cut face image by using a face extractor;
in practical applications, as a preferred embodiment of the present invention, the step (2) specifically includes the following steps:
(2.1) dividing a complete video into a plurality of 64 frame video segments in steps 16;
(2.2) for each frame, carrying out face alignment and extracting face key points;
(2.3) selecting 6 heartbeat signal information areas according to the key points of the human face to form an ROI set R t ={R 1t ,R 2t ,…,R nt };
(2.4) for the ROI set R t All non-empty subsets in the list are calculated, and the average value of all pixels contained in each non-empty subset is obtained to obtain 2 6 -1, the pixel mean of 63 RGB three channels;
(2.5) for each video clip, the same operations (2.2) - (2.4) are performed on 64 frames contained in each video clip, so as to obtain a multi-scale space-time diagram with dimensions of 64×63×3, wherein 64 is a time length, 63 is the number of combination modes of different information areas, and 3 is the number of RGB channels.
As a preferred embodiment of the present invention, the step (3) includes the steps of:
(3.1) constructing EfficientNet as backbone network f (& gt) for input rPPG time space diagram X εR 3×64×63 Extracting features through a backbone network, and obtaining a middle-layer feature map Fm=f mid (X)∈R C×H×W Wherein C, H, W represent the channel number, column number and line number of the feature map respectively;
(3.2) building a mask-guided local attention module to middle layer feature map F m For input, an attention mask A is obtained mask :
A mask =Sigmoid(Conv(F m ))
Wherein Conv (·) represents a convolution operation;
(3.3) masking attention and middle layer feature map F m Dot product, obtain a feature map F' =a after position weighting mask ·F m Taking F' as input to extract the characteristics of the subsequent network layer;
(3.4) calculating the pixel level mask tag A of the rPPG time space map gt : for rPPG space-time diagram generated by false video, finding out its corresponding real rPPG space-time diagram, making difference pixel by pixel to obtain residual space-time diagram, graying residual space-time diagram, normalizing 0 to 1, and makingWhich is sized to be aligned with the attention mask a mask Binarizing the same size with 0.1 as a threshold value to obtain a corresponding pixel-level mask label A gt ;
(3.5) masking the attention A mask And corresponding pixel level mask tag A gt The L1 distance is calculated as a mask loss function L according to the following formula mask :
As a preferred embodiment of the present invention, the step (4) specifically includes the following steps:
(4.1) construction of the input sequence Z of the transducer in :
K time-adjacent rPPG space-time diagrams are respectively input into the backbone network trained in the first stage to obtain K global high-dimensional characteristics F h Then global average pooling is carried out, and classification codes and one-dimensional leachable position codes are overlapped to be used as an input sequence Z of a transducer in ;
(4.2): constructing a two-stage model transducer to obtain the comprehensive characteristics of fusion of a plurality of adjacent rPPG time-space diagrams:
input sequence Z in Generating a query matrix through a linear mapping layerKey matrix And a Value matrix +.>Then, three matrices are transferred into the multi-head self-attention mechanism MSA, as shown in the following formula:
wherein T is matrix transposition operation, and d is normalization constant. Then the characteristic fusion output Z after the conversion process is obtained through the FFN processing of a feedforward network layer formed by a multi-layer perceptron out 。
As a preferred embodiment of the present invention, the step (5) specifically includes:
global average pooling is carried out on the fused features, and then a fully connected network FC is used to map the dimension number to the category number 2 so as to obtainCalculating a final prediction score y' according to Z, and calculating a two-category cross entropy loss L according to the label y ce Finally, constructing an overall loss function L all As shown in the following formula:
Z=FC(g(Z out ))
L ce =y log y′+(1-y)log(1-y′)
L all =L ce +λL mask
where λ is the hyper-parameter used to balance cross entropy loss and mask loss.
Referring to fig. 3, the system for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model by using the method includes:
the rPPG multi-scale space-time diagram generation module is used for calculating an rPPG space-time diagram from the face video frame;
the mask-guided local attention module is connected with the rPPG multi-scale space-time diagram generation module and is used for enhancing the learning of local information and extracting the characteristics of a single rPPG space-time diagram;
the transducer module is connected with the local attention module guided by the mask and used for fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams; and
the classification head module is connected with the transducer module and is used for pooling the integrated features after fusion and carrying out classification recognition processing so as to obtain the identification result of the target image and construct an overall loss function.
In a specific embodiment of the present invention, the classification and identification method using the present technical solution is tested as follows:
(1) Experimental data set
The invention uses deep face counterfeiting faceforensics++ (FF++) for experimental verification. The ff++ dataset includes 1000 raw videos, 720 of which are used for training and 280 of which are used for testing and validation. Each video was forged by four different facial manipulation methods, namely Deepfackes (DF), face2Face (F2F), faceSwap (FS) and NeuralTextures (NT). Two of these methods replace the full face (DF and FS) and the other two methods only deal with localized areas around the mouth or eyes (F2F and NT).
(2) Training process
The initial learning rate was set to 1e-2, learning was performed using an SGD optimizer, batch was set to 32, and training was performed for 30 rounds.
(3) Test results
In this embodiment, training and testing are performed on four sub-data sets of ff++, respectively, the true and false two-classification capability of the method is evaluated, then multi-classification training and testing are performed on five classes on the data set, and Accuracy (acc.) is selected as an algorithm evaluation index. The experimental results are shown in table 1.
Table 1 Performance of the model on the 1 FF ++ dataset (%)
As can be seen from table 1, the present embodiment has excellent performance in terms of whether the ff++ data set is a training sample, and whether it is a true or false two-class or five-class multi-class, showing the effectiveness of the algorithm.
The device for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model comprises:
a processor configured to execute computer-executable instructions;
and a memory storing one or more computer-executable instructions which, when executed by the processor, implement the steps of the method for implementing deep counterfeited face authentication based on the rpg multiscale space-time diagram and the two-stage model described above.
The processor is configured to execute computer executable instructions, which when executed by the processor, implement the steps of the method for implementing the deep fake face identification based on the rPPG multi-scale space-time diagram and the two-stage model.
The computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method for implementing deep counterfeited face authentication based on rpg multiscale space-time diagrams and two-stage models described above.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "specific examples," or "embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
The method, the system, the device, the processor and the computer readable storage medium thereof for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model are adopted, the multi-scale space-time diagram of the heart rate signal rPPG is innovatively taken as model input, and a classical CNN model (such as EfficientNet) and a Transformer are used as the two-stage model. In order to enhance the perception of the model on the local position information, the invention also innovatively introduces a mask-guided local attention module, and the model is guided to further distinguish different modes of the vacuum space-time diagram through the indication of the pixel-level space-time diagram mask label. The transducer module fuses the features of multiple neighboring rpg time-space diagrams through a self-attention mechanism. The technical scheme has the advantages that experimental verification is carried out on the faceforensis++ data set, and compared with a baseline model, the method has a more prominent classification and identification effect.
In this specification, the invention has been described with reference to specific embodiments thereof. It will be apparent, however, that various modifications and changes may be made without departing from the spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Claims (10)
1. The method for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model is characterized by comprising the following steps of:
(1) Collecting a depth fake face video data set, and preprocessing video data to obtain a cut face video frame set;
(2) Generating an rPPG multi-scale space-time diagram according to the face video frame obtained after cutting;
(3) Constructing a mask guided local attention module, performing first-stage training, and extracting the characteristics of a single rPPG time space diagram;
(4) Constructing a transducer module, performing second-stage training, and fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams;
(5) And constructing a classification head, pooling the fused high-dimensional features, classifying and identifying the fused high-dimensional features to obtain the identification result of the target image and constructing an overall loss function.
2. The method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to claim 1, wherein the step (2) includes the steps of:
(2.1) dividing a complete video into a plurality of T-frame video segments in step omega frames;
(2.2) for each frame, carrying out face alignment and extracting face key points;
(2.3) selecting n heartbeat signal information areas according to the face key points to form an ROI set R t ={R 1t ,R 2t ,…,R nt };
(2.4) for the ROI set R t All non-empty subsets in the list are calculated, and the average value of all pixels contained in each non-empty subset is obtained to obtain 2 n -1 RGB three channelA pixel mean value;
(2.5) for each video clip, the T frames contained therein are subjected to the operations of steps (2.2) - (2.4), resulting in a dimension of T× (2) n -1) x 3 multiscale space-time diagram, wherein T is the length of time, 2 n -1 is the number of combinations of different information areas and 3 is the number of RGB channels.
3. The method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to claim 1, wherein the step (3) specifically comprises the following steps:
(3.1) constructing EfficientNet as backbone network f (& gt) for input rPPG time space diagramExtracting features through backbone network, and obtaining middle layer feature map F m =f mid (X)∈R C×H×W Wherein C, H, W represent the channel number, column number and line number of the feature map respectively;
(3.2) building a mask-guided local attention module to middle layer feature map F m For input, an attention mask A is obtained mask :
A mask =Sigmoid(Conv(F m ))
Wherein Conv (·) represents a convolution operation;
(3.3) masking the attention with the middle layer feature map F m Performing point multiplication to obtain a position weighted feature map F' =A mask ·F m Taking F' as input to extract the characteristics of the subsequent network layer;
(3.4) calculating the pixel level mask tag A of the rPPG time space map gt : for rPPG space-time diagram generated by false video, finding out its corresponding real rPPG space-time diagram, making difference pixel by pixel to obtain residual space-time diagram, graying residual space-time diagram, normalizing 0 to 1, and regulating its size to be identical to attention mask A mask Binarizing the same size with 0.1 as a threshold value to obtain a corresponding pixel-level mask label A gt ;
(3.5) masking the attention A mask And corresponding pixel level mask tag A gt The L1 distance is calculated as a mask loss function L according to the following formula mask :
L mask =|A mask -A gt | 1 。
4. The method for implementing deep fake face identification by using the basic rpg multiscale space-time diagram and the two-stage model according to claim 1, wherein the step (4) specifically includes the following steps:
(4.1) respectively inputting K adjacent rPPG time space diagrams into the backbone network trained in the first stage to obtain K global high-dimensional characteristics F h Then global average pooling is carried out, and classification codes and one-dimensional leachable position codes are overlapped to be used as an input sequence Z of a transducer in ;
(4.2) constructing a feature fusion module of a plurality of rPPG time space diagrams based on a transducer: will input sequence Z in Performing multi-head self-attention operation MSA, passing through a feed forward network FFN, and after each operation is performed, further adjusting output by using layer normalization LN and residual connection to obtain output result Z of transducer out 。
5. The method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to claim 4, wherein the step (4.2) specifically comprises the following steps:
(4.2.1) input sequence Z in Generating a query matrix through a linear mapping layerKey matrix->And a Value matrix +.>The three matrices are then transferred into a multi-headed self-attention mechanism MSA as shown in the following equation:
wherein d is a normalization constant, and T is matrix transposition operation;
(4.2.2) obtaining a feature fusion output Z after the conversion process through the FFN process of a feedforward network layer consisting of a multi-layer perceptron out 。
6. The method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to claim 4, wherein the step (5) specifically comprises the following steps:
(5.1) the fused comprehensive characteristics Z obtained from the second stage training output out Global average pooling g (·) is performed, and then a fully connected network FC is used to map dimensions to category number 2 to obtain vector Z εR 2 As shown in the following formula:
Z=FC(g(Z out ))
(5.2) calculating Softmax for vector Z to obtain final prediction score y', and calculating two-category cross entropy loss L according to label y ce As shown in the following formula:
L ce =y log y′+(1-y)log(1-y′)
(5.3) construction of the Overall loss function L all As shown in the following formula:
L all =L ce +λL mask
where λ is the hyper-parameter used to balance cross entropy loss and mask loss.
7. A system for implementing deep counterfeited face authentication based on rpg multiscale space-time diagrams and two-stage models using the method of any one of claims 1 to 6, characterized in that the system comprises:
the rPPG multi-scale space-time diagram generation module is used for calculating an rPPG space-time diagram from the face video frame;
the mask-guided local attention module is connected with the rPPG multi-scale space-time diagram generation module and is used for enhancing the learning of local information and extracting the characteristics of a single rPPG space-time diagram;
the transducer module is connected with the local attention module guided by the mask and used for fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams; and
the classification head module is connected with the transducer module and is used for pooling the integrated features after fusion and carrying out classification recognition processing so as to obtain the identification result of the target image and construct an overall loss function.
8. An apparatus for implementing rfpg multi-scale space-time diagram and two-stage model based deep counterfeited face authentication, the apparatus comprising:
a processor configured to execute computer-executable instructions;
a memory storing one or more computer-executable instructions which, when executed by the processor, perform the steps of the method for implementing deep counterfeited face authentication based on rpg multiscale space-time diagrams and two-phase models of any one of claims 1 to 6.
9. A processor for implementing rfpg multi-scale space-time diagram and two-stage model based deep counterfeited face authentication, characterized in that the processor is configured to execute computer executable instructions which, when executed by the processor, implement the respective steps of the method of implementing the rfpg multi-scale space-time diagram and two-stage model based deep counterfeited face authentication according to any one of claims 1 to 6.
10. A computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method of any one of claims 1 to 6 for implementing deep counterfeited face authentication based on rpg multiscale space-time diagrams and two-stage models.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310093773.8A CN116012958A (en) | 2023-02-10 | 2023-02-10 | Method, system, device, processor and computer readable storage medium for implementing deep fake face identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310093773.8A CN116012958A (en) | 2023-02-10 | 2023-02-10 | Method, system, device, processor and computer readable storage medium for implementing deep fake face identification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116012958A true CN116012958A (en) | 2023-04-25 |
Family
ID=86037336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310093773.8A Pending CN116012958A (en) | 2023-02-10 | 2023-02-10 | Method, system, device, processor and computer readable storage medium for implementing deep fake face identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116012958A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116258914A (en) * | 2023-05-15 | 2023-06-13 | 齐鲁工业大学(山东省科学院) | Remote sensing image classification method based on machine learning and local and global feature fusion |
CN116311482A (en) * | 2023-05-23 | 2023-06-23 | 中国科学技术大学 | Face fake detection method, system, equipment and storage medium |
CN116385468A (en) * | 2023-06-06 | 2023-07-04 | 浙江大学 | System based on zebra fish heart parameter image analysis software generation |
CN116486464A (en) * | 2023-06-20 | 2023-07-25 | 齐鲁工业大学(山东省科学院) | Attention mechanism-based face counterfeiting detection method for convolution countermeasure network |
-
2023
- 2023-02-10 CN CN202310093773.8A patent/CN116012958A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116258914A (en) * | 2023-05-15 | 2023-06-13 | 齐鲁工业大学(山东省科学院) | Remote sensing image classification method based on machine learning and local and global feature fusion |
CN116258914B (en) * | 2023-05-15 | 2023-08-25 | 齐鲁工业大学(山东省科学院) | Remote Sensing Image Classification Method Based on Machine Learning and Local and Global Feature Fusion |
CN116311482A (en) * | 2023-05-23 | 2023-06-23 | 中国科学技术大学 | Face fake detection method, system, equipment and storage medium |
CN116311482B (en) * | 2023-05-23 | 2023-08-29 | 中国科学技术大学 | Face fake detection method, system, equipment and storage medium |
CN116385468A (en) * | 2023-06-06 | 2023-07-04 | 浙江大学 | System based on zebra fish heart parameter image analysis software generation |
CN116385468B (en) * | 2023-06-06 | 2023-09-01 | 浙江大学 | System based on zebra fish heart parameter image analysis software generation |
CN116486464A (en) * | 2023-06-20 | 2023-07-25 | 齐鲁工业大学(山东省科学院) | Attention mechanism-based face counterfeiting detection method for convolution countermeasure network |
CN116486464B (en) * | 2023-06-20 | 2023-09-01 | 齐鲁工业大学(山东省科学院) | Attention mechanism-based face counterfeiting detection method for convolution countermeasure network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Van Quang et al. | CapsuleNet for micro-expression recognition | |
CN116012958A (en) | Method, system, device, processor and computer readable storage medium for implementing deep fake face identification | |
Liu et al. | Deep learning face attributes in the wild | |
CN110837570B (en) | Method for unbiased classification of image data | |
Narayanan et al. | Hybrid machine learning architecture for automated detection and grading of retinal images for diabetic retinopathy | |
CN111968124B (en) | Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation | |
Chen et al. | A pornographic images recognition model based on deep one-class classification with visual attention mechanism | |
Lajevardi et al. | Facial expression recognition from image sequences using optimized feature selection | |
Littlewort et al. | Fully automatic coding of basic expressions from video | |
Bhattacharyya et al. | Recognizing gender from human facial regions using genetic algorithm | |
CN114937298A (en) | Micro-expression recognition method based on feature decoupling | |
Dadwhal et al. | Data-driven skin detection in cluttered search and rescue environments | |
Bachay et al. | Hybrid Deep Learning Model Based on Autoencoder and CNN for Palmprint Authentication. | |
Aktürk et al. | Classification of eye images by personal details with transfer learning algorithms | |
Takalkar et al. | Improving micro-expression recognition accuracy using twofold feature extraction | |
Yatbaz et al. | Deep learning based stress prediction from offline signatures | |
George et al. | Multi-channel face presentation attack detection using deep learning | |
Iniyan et al. | Wavelet transformation and vertical stacking based image classification applying machine learning | |
Neagoe et al. | Subject independent drunkenness detection using pulse-coupled neural network segmentation of thermal infrared facial imagery | |
Kumar et al. | Siamese based Neural Network for Offline Writer Identification on word level data | |
Iffath et al. | A Novel Three Stage Framework for Person Identification From Audio Aesthetic | |
CN113343770A (en) | Face anti-counterfeiting method based on feature screening | |
Wang et al. | Audiovisual emotion recognition via cross-modal association in kernel space | |
Lu et al. | Joint Subspace and Low‐Rank Coding Method for Makeup Face Recognition | |
Filisbino et al. | Multi-class nonlinear discriminant feature analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |