CN116012958A - Method, system, device, processor and computer readable storage medium for implementing deep fake face identification - Google Patents

Method, system, device, processor and computer readable storage medium for implementing deep fake face identification Download PDF

Info

Publication number
CN116012958A
CN116012958A CN202310093773.8A CN202310093773A CN116012958A CN 116012958 A CN116012958 A CN 116012958A CN 202310093773 A CN202310093773 A CN 202310093773A CN 116012958 A CN116012958 A CN 116012958A
Authority
CN
China
Prior art keywords
space
mask
rppg
time
time diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310093773.8A
Other languages
Chinese (zh)
Inventor
朱煜
吴嘉辉
汪楠
李航宇
叶炯耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202310093773.8A priority Critical patent/CN116012958A/en
Publication of CN116012958A publication Critical patent/CN116012958A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to a method for realizing deep fake face identification based on rPPG multi-scale space-time diagram and a two-stage model, wherein the method comprises the following steps: (1) Collecting a depth fake face video data set, and preprocessing videos; (2) generating an rPPG multi-scale space-time diagram; (3) Constructing a mask guided local attention module, performing first-stage training, and extracting the characteristics of a single rPPG time space diagram; (4) Constructing a time domain aggregation module based on a transducer, performing second-stage training, and fusing comprehensive features of a plurality of adjacent time-space diagrams; (6) The construction classification head performs classification recognition processing and constructs a loss function. The invention also relates to a corresponding system, device, processor and storage medium thereof. By adopting the method, the system, the device, the processor and the storage medium thereof, the comprehensive characteristics of a plurality of time-space diagrams representing one video are extracted through the two-stage model, and compared with a baseline model, the method, the system, the device, the processor and the storage medium thereof have better interpretability and the fake face identification effect.

Description

Method, system, device, processor and computer readable storage medium for implementing deep fake face identification
Technical Field
The invention relates to the technical field of digital images, in particular to the technical field of computer vision, and specifically relates to a method, a system, a device, a processor and a computer readable storage medium for realizing deep fake face identification based on rPPG multi-scale space-time diagrams and a two-stage model.
Background
With the development of the generated depth model, the technical threshold of the depth face counterfeiting is lower, and people can easily create vivid face counterfeiting content through the disclosed model or tool. Deep forgery may also be misused by malicious users, creating false political information or propagating pornography. As a defense mechanism, face counterfeit authentication techniques have been developed and used to mitigate the risks associated with deep counterfeiting. Remote photoplethysmography (rpg) extracts the heart beat signal from the recorded video by examining subtle changes in skin color caused by heart activity. Because the face counterfeiting process inevitably destroys the periodic variation of facial color, rpg has proven to be a biological indicator that can be used to effectively identify counterfeit faces.
However, most existing rpg signal depth-based face-forgery identification methods still have some drawbacks. Such as: the application number is: the invention patent application of CN202210572034.2 takes 32 square small frames on each frame of human face to extract heart rate signals, but the ROI areas are overlapped with each other and have single scale; and only a one-stage encoder is used for extracting the characteristics of a single rPPG space-time diagram, and the characteristic fusion of a plurality of adjacent rPPG space-time diagrams is not considered; and only two kinds of cross entropy loss are used, and the attention weight of the local position of the pixel level is not considered, so that the detection performance is limited.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method, a system, a device, a processor and a computer readable storage medium thereof for realizing deep fake face identification based on rPPG multi-scale space-time diagram, which can effectively consider the comprehensive characteristics of a plurality of adjacent video clips.
To achieve the above object, the method, system, device, processor and computer readable storage medium thereof for implementing deep fake face authentication based on rpg multiscale space-time diagram and two-stage model of the present invention are as follows:
the method for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model is mainly characterized by comprising the following steps of:
(1) Collecting a depth fake face video data set, and preprocessing video data to obtain a cut face video frame set;
(2) Generating an rPPG multi-scale space-time diagram according to the face video frame obtained after cutting;
(3) Constructing a mask guided local attention module, performing first-stage training, and extracting the characteristics of a single rPPG time space diagram;
(4) Constructing a transducer module, performing second-stage training, and fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams;
(5) And constructing a classification head, pooling the fused high-dimensional features, classifying and identifying the fused high-dimensional features to obtain the identification result of the target image and constructing an overall loss function.
Preferably, the step (2) specifically includes the following steps:
(2.1) dividing a complete video into a plurality of T-frame video segments in step omega frames;
(2.2) for each frame, carrying out face alignment and extracting face key points;
(2.3) selecting n heartbeat signal information areas according to the face key points to form an ROI set R t ={R 1t ,R 2t ,…,R nt };
(2.4) for the ROI set R t All non-empty subsets in the list are calculated, and the average value of all pixels contained in each non-empty subset is obtained to obtain 2 n -1 pixel mean of RGB three channels;
(2.5) for each video clip, the T frames contained therein are subjected to the operations of steps (2.2) - (2.4), resulting in a dimension of T× (2) n -1) x 3 multiscale space-time diagram, wherein T is the length of time, 2 n -1 is the number of combinations of different information areas and 3 is the number of RGB channels.
Preferably, the n information areas in (2.3) are forehead, chin, upper left and right cheeks, lower left and right cheeks, respectively, and the specific areas are shown in fig. 2.
Preferably, the step (3) specifically includes the following steps:
(3.1) constructing EfficientNet as backbone network f (& gt) for input rPPG time space diagram
Figure BDA0004071098960000021
Extracting features through backbone network, and obtaining middle layer feature map F m =f mid (X)∈R C×H×W Wherein C, H, W represent the channel number, column number and line number of the feature map respectively;
(3.2) building a mask-guided local attention module to middle layer feature map F m For input, an attention mask A is obtained mask
A mask =Sigmoid(Conv(F m ))
Wherein Conv (·) represents a convolution operation;
(3.3) masking the attention with the middle layer feature map F m Performing point multiplication to obtain a position weighted feature map F' =A mask ·F m Taking F' as input to extract the characteristics of the subsequent network layer;
(3.4) calculating the pixel level mask tag A of the rPPG time space map gt : for rPPG space-time diagram generated by false video, finding out its corresponding real rPPG space-time diagram, making difference pixel by pixel to obtain residual space-time diagram, graying residual space-time diagram, normalizing 0 to 1, and regulating its size to be identical to attention mask A mask Binarizing the same size with 0.1 as a threshold value to obtain a corresponding pixel-level mask label A gt
(3.5) masking the attention A mask And corresponding pixel level mask tag A gt The L1 distance is calculated as a mask loss function L according to the following formula mask
Figure BDA0004071098960000036
More preferably, the step (4) specifically includes the following steps:
(4.1) respectively inputting K adjacent rPPG time space diagrams into the backbone network trained in the first stage to obtain K global high-dimensional characteristics F h Then global average pooling is carried out, and classification codes and one-dimensional leachable position codes are overlapped to be used as an input sequence Z of a transducer in
(4.2) constructing a feature fusion module of a plurality of rPPG time space diagrams based on a transducer: will input sequence Z in Performing multi-head self-attention operation MSA, passing through a feed forward network FFN, and after each operation is performed, further adjusting output by using layer normalization LN and residual connection to obtain output result Z of transducer out
More preferably, the step (4.2) specifically includes the following steps:
(4.2.1) input sequence Z in Generating a query matrix through a linear mapping layer
Figure BDA0004071098960000031
Key matrix->
Figure BDA0004071098960000032
And a Value matrix +.>
Figure BDA0004071098960000033
The three matrices are then transferred into a multi-headed self-attention mechanism MSA as shown in the following equation:
Figure BDA0004071098960000034
wherein d is a normalization constant, and T is matrix transposition operation;
(4.2.2) obtaining a feature fusion output Z after the conversion process through the FFN process of a feedforward network layer consisting of a multi-layer perceptron out
More preferably, the step (5) specifically includes the following steps:
(5.1) the fused comprehensive characteristics Z obtained from the second stage training output out Global average pooling g (·) is performed, and then a fully connected network FC is used to map dimensions to category number 2 to obtain vectors
Figure BDA0004071098960000035
As shown in the following formula:
Z=FC(g(Z out ))
(5.2) calculating Softmax from Z to obtain a final prediction score y', and calculating a two-category cross entropy loss L from the label y ce As shown in the following formula:
L ce =y log y′+(1-y)log(1-y′)
(5.3) construction of the Overall loss function L all As shown in the following formula:
L all =L ce +λL mask
where λ is the hyper-parameter used to balance cross entropy loss and mask loss.
The system for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model by using the method is mainly characterized by comprising the following steps:
the rPPG multi-scale space-time diagram generation module is used for calculating an rPPG space-time diagram from the face video frame;
the mask-guided local attention module is connected with the rPPG multi-scale space-time diagram generation module and is used for enhancing the learning of local information and extracting the characteristics of a single rPPG space-time diagram;
the transducer module is connected with the local attention module guided by the mask and used for fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams; and
the classification head module is connected with the transducer module and is used for pooling the integrated features after fusion and carrying out classification recognition processing so as to obtain the identification result of the target image and construct an overall loss function.
The device for realizing the deep fake face identification based on the rPPG multi-scale space-time diagram and the two-stage model is mainly characterized by comprising the following components:
a processor configured to execute computer-executable instructions;
and a memory storing one or more computer-executable instructions which, when executed by the processor, implement the steps of the method for implementing deep counterfeited face authentication based on the rpg multiscale space-time diagram and the two-stage model described above.
The processor for realizing the deep fake face identification based on the rPPG multi-scale space-time diagram and the two-stage model is mainly characterized in that the processor is configured to execute computer executable instructions, and when the computer executable instructions are executed by the processor, the steps of the method for realizing the deep fake face identification based on the rPPG multi-scale space-time diagram and the two-stage model are realized.
The computer readable storage medium is mainly characterized in that the computer program is stored thereon, and the computer program can be executed by a processor to realize the steps of the method for realizing the identification of the deep fake human face based on the rPPG multi-scale time space diagram and the two-stage model.
The method, the system, the device, the processor and the computer readable storage medium thereof for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model are adopted, the multi-scale space-time diagram of the heart rate signal rPPG is innovatively taken as model input, and a classical CNN model (such as EfficientNet) and a Transformer are used as the two-stage model. In order to enhance the perception of the model on the local position information, the invention also innovatively introduces a mask-guided local attention module, and the model is guided to further distinguish different modes of the vacuum space-time diagram through the indication of the pixel-level space-time diagram mask label. The transducer module fuses the features of multiple neighboring rpg time-space diagrams through a self-attention mechanism. The technical scheme has the advantages that experimental verification is carried out on the faceforensis++ data set, and compared with a baseline model, the method has a more prominent classification and identification effect.
Drawings
Fig. 1 is a schematic diagram of a generation flow of a method for implementing deep fake face identification based on an rpg multi-scale space-time diagram and a two-stage model.
Fig. 2 is a schematic flow chart of generating a multi-scale rpg time-space diagram based on the rpg multi-scale time-space diagram and the two-stage model for realizing the deep fake face identification.
Fig. 3 is a schematic diagram of a frame structure of a system for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to the present invention.
FIG. 4 is a schematic diagram of a transducer module according to the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, a further description will be made below in connection with specific embodiments.
Before describing in detail embodiments that are in accordance with the present invention, it should be observed that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, the method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model includes the following steps:
the method for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model is mainly characterized by comprising the following steps of:
(1) Collecting a depth fake face video data set, and preprocessing video data to obtain a cut face video frame set;
(2) Generating an rPPG multi-scale space-time diagram according to the face video frame obtained after cutting;
(3) Constructing a mask guided local attention module, performing first-stage training, and extracting the characteristics of a single rPPG time space diagram;
(4) Constructing a transducer module, performing second-stage training, and fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams;
(5) And constructing a classification head, pooling the fused high-dimensional features, classifying and identifying the fused high-dimensional features to obtain the identification result of the target image and constructing an overall loss function.
In practical application, the step (1) specifically includes:
downloading a faceforensic++ data set from a data set officer network to obtain an original video, extracting an image from the original video, and obtaining a cut face image by using a face extractor;
in practical applications, as a preferred embodiment of the present invention, the step (2) specifically includes the following steps:
(2.1) dividing a complete video into a plurality of 64 frame video segments in steps 16;
(2.2) for each frame, carrying out face alignment and extracting face key points;
(2.3) selecting 6 heartbeat signal information areas according to the key points of the human face to form an ROI set R t ={R 1t ,R 2t ,…,R nt };
(2.4) for the ROI set R t All non-empty subsets in the list are calculated, and the average value of all pixels contained in each non-empty subset is obtained to obtain 2 6 -1, the pixel mean of 63 RGB three channels;
(2.5) for each video clip, the same operations (2.2) - (2.4) are performed on 64 frames contained in each video clip, so as to obtain a multi-scale space-time diagram with dimensions of 64×63×3, wherein 64 is a time length, 63 is the number of combination modes of different information areas, and 3 is the number of RGB channels.
As a preferred embodiment of the present invention, the step (3) includes the steps of:
(3.1) constructing EfficientNet as backbone network f (& gt) for input rPPG time space diagram X εR 3×64×63 Extracting features through a backbone network, and obtaining a middle-layer feature map Fm=f mid (X)∈R C×H×W Wherein C, H, W represent the channel number, column number and line number of the feature map respectively;
(3.2) building a mask-guided local attention module to middle layer feature map F m For input, an attention mask A is obtained mask
A mask =Sigmoid(Conv(F m ))
Wherein Conv (·) represents a convolution operation;
(3.3) masking attention and middle layer feature map F m Dot product, obtain a feature map F' =a after position weighting mask ·F m Taking F' as input to extract the characteristics of the subsequent network layer;
(3.4) calculating the pixel level mask tag A of the rPPG time space map gt : for rPPG space-time diagram generated by false video, finding out its corresponding real rPPG space-time diagram, making difference pixel by pixel to obtain residual space-time diagram, graying residual space-time diagram, normalizing 0 to 1, and makingWhich is sized to be aligned with the attention mask a mask Binarizing the same size with 0.1 as a threshold value to obtain a corresponding pixel-level mask label A gt
(3.5) masking the attention A mask And corresponding pixel level mask tag A gt The L1 distance is calculated as a mask loss function L according to the following formula mask
Figure BDA0004071098960000061
As a preferred embodiment of the present invention, the step (4) specifically includes the following steps:
(4.1) construction of the input sequence Z of the transducer in
K time-adjacent rPPG space-time diagrams are respectively input into the backbone network trained in the first stage to obtain K global high-dimensional characteristics F h Then global average pooling is carried out, and classification codes and one-dimensional leachable position codes are overlapped to be used as an input sequence Z of a transducer in
(4.2): constructing a two-stage model transducer to obtain the comprehensive characteristics of fusion of a plurality of adjacent rPPG time-space diagrams:
input sequence Z in Generating a query matrix through a linear mapping layer
Figure BDA0004071098960000071
Key matrix
Figure BDA0004071098960000072
Figure BDA0004071098960000073
And a Value matrix +.>
Figure BDA0004071098960000074
Then, three matrices are transferred into the multi-head self-attention mechanism MSA, as shown in the following formula:
Figure BDA0004071098960000075
wherein T is matrix transposition operation, and d is normalization constant. Then the characteristic fusion output Z after the conversion process is obtained through the FFN processing of a feedforward network layer formed by a multi-layer perceptron out
As a preferred embodiment of the present invention, the step (5) specifically includes:
global average pooling is carried out on the fused features, and then a fully connected network FC is used to map the dimension number to the category number 2 so as to obtain
Figure BDA0004071098960000076
Calculating a final prediction score y' according to Z, and calculating a two-category cross entropy loss L according to the label y ce Finally, constructing an overall loss function L all As shown in the following formula:
Z=FC(g(Z out ))
L ce =y log y′+(1-y)log(1-y′)
L all =L ce +λL mask
where λ is the hyper-parameter used to balance cross entropy loss and mask loss.
Referring to fig. 3, the system for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model by using the method includes:
the rPPG multi-scale space-time diagram generation module is used for calculating an rPPG space-time diagram from the face video frame;
the mask-guided local attention module is connected with the rPPG multi-scale space-time diagram generation module and is used for enhancing the learning of local information and extracting the characteristics of a single rPPG space-time diagram;
the transducer module is connected with the local attention module guided by the mask and used for fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams; and
the classification head module is connected with the transducer module and is used for pooling the integrated features after fusion and carrying out classification recognition processing so as to obtain the identification result of the target image and construct an overall loss function.
In a specific embodiment of the present invention, the classification and identification method using the present technical solution is tested as follows:
(1) Experimental data set
The invention uses deep face counterfeiting faceforensics++ (FF++) for experimental verification. The ff++ dataset includes 1000 raw videos, 720 of which are used for training and 280 of which are used for testing and validation. Each video was forged by four different facial manipulation methods, namely Deepfackes (DF), face2Face (F2F), faceSwap (FS) and NeuralTextures (NT). Two of these methods replace the full face (DF and FS) and the other two methods only deal with localized areas around the mouth or eyes (F2F and NT).
(2) Training process
The initial learning rate was set to 1e-2, learning was performed using an SGD optimizer, batch was set to 32, and training was performed for 30 rounds.
(3) Test results
In this embodiment, training and testing are performed on four sub-data sets of ff++, respectively, the true and false two-classification capability of the method is evaluated, then multi-classification training and testing are performed on five classes on the data set, and Accuracy (acc.) is selected as an algorithm evaluation index. The experimental results are shown in table 1.
Table 1 Performance of the model on the 1 FF ++ dataset (%)
Figure BDA0004071098960000081
As can be seen from table 1, the present embodiment has excellent performance in terms of whether the ff++ data set is a training sample, and whether it is a true or false two-class or five-class multi-class, showing the effectiveness of the algorithm.
The device for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model comprises:
a processor configured to execute computer-executable instructions;
and a memory storing one or more computer-executable instructions which, when executed by the processor, implement the steps of the method for implementing deep counterfeited face authentication based on the rpg multiscale space-time diagram and the two-stage model described above.
The processor is configured to execute computer executable instructions, which when executed by the processor, implement the steps of the method for implementing the deep fake face identification based on the rPPG multi-scale space-time diagram and the two-stage model.
The computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method for implementing deep counterfeited face authentication based on rpg multiscale space-time diagrams and two-stage models described above.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "specific examples," or "embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
The method, the system, the device, the processor and the computer readable storage medium thereof for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model are adopted, the multi-scale space-time diagram of the heart rate signal rPPG is innovatively taken as model input, and a classical CNN model (such as EfficientNet) and a Transformer are used as the two-stage model. In order to enhance the perception of the model on the local position information, the invention also innovatively introduces a mask-guided local attention module, and the model is guided to further distinguish different modes of the vacuum space-time diagram through the indication of the pixel-level space-time diagram mask label. The transducer module fuses the features of multiple neighboring rpg time-space diagrams through a self-attention mechanism. The technical scheme has the advantages that experimental verification is carried out on the faceforensis++ data set, and compared with a baseline model, the method has a more prominent classification and identification effect.
In this specification, the invention has been described with reference to specific embodiments thereof. It will be apparent, however, that various modifications and changes may be made without departing from the spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (10)

1. The method for realizing the identification of the deeply forged human face based on the rPPG multi-scale space-time diagram and the two-stage model is characterized by comprising the following steps of:
(1) Collecting a depth fake face video data set, and preprocessing video data to obtain a cut face video frame set;
(2) Generating an rPPG multi-scale space-time diagram according to the face video frame obtained after cutting;
(3) Constructing a mask guided local attention module, performing first-stage training, and extracting the characteristics of a single rPPG time space diagram;
(4) Constructing a transducer module, performing second-stage training, and fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams;
(5) And constructing a classification head, pooling the fused high-dimensional features, classifying and identifying the fused high-dimensional features to obtain the identification result of the target image and constructing an overall loss function.
2. The method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to claim 1, wherein the step (2) includes the steps of:
(2.1) dividing a complete video into a plurality of T-frame video segments in step omega frames;
(2.2) for each frame, carrying out face alignment and extracting face key points;
(2.3) selecting n heartbeat signal information areas according to the face key points to form an ROI set R t ={R 1t ,R 2t ,…,R nt };
(2.4) for the ROI set R t All non-empty subsets in the list are calculated, and the average value of all pixels contained in each non-empty subset is obtained to obtain 2 n -1 RGB three channelA pixel mean value;
(2.5) for each video clip, the T frames contained therein are subjected to the operations of steps (2.2) - (2.4), resulting in a dimension of T× (2) n -1) x 3 multiscale space-time diagram, wherein T is the length of time, 2 n -1 is the number of combinations of different information areas and 3 is the number of RGB channels.
3. The method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to claim 1, wherein the step (3) specifically comprises the following steps:
(3.1) constructing EfficientNet as backbone network f (& gt) for input rPPG time space diagram
Figure FDA0004071098940000011
Extracting features through backbone network, and obtaining middle layer feature map F m =f mid (X)∈R C×H×W Wherein C, H, W represent the channel number, column number and line number of the feature map respectively;
(3.2) building a mask-guided local attention module to middle layer feature map F m For input, an attention mask A is obtained mask
A mask =Sigmoid(Conv(F m ))
Wherein Conv (·) represents a convolution operation;
(3.3) masking the attention with the middle layer feature map F m Performing point multiplication to obtain a position weighted feature map F' =A mask ·F m Taking F' as input to extract the characteristics of the subsequent network layer;
(3.4) calculating the pixel level mask tag A of the rPPG time space map gt : for rPPG space-time diagram generated by false video, finding out its corresponding real rPPG space-time diagram, making difference pixel by pixel to obtain residual space-time diagram, graying residual space-time diagram, normalizing 0 to 1, and regulating its size to be identical to attention mask A mask Binarizing the same size with 0.1 as a threshold value to obtain a corresponding pixel-level mask label A gt
(3.5) masking the attention A mask And corresponding pixel level mask tag A gt The L1 distance is calculated as a mask loss function L according to the following formula mask
L mask =|A mask -A gt | 1
4. The method for implementing deep fake face identification by using the basic rpg multiscale space-time diagram and the two-stage model according to claim 1, wherein the step (4) specifically includes the following steps:
(4.1) respectively inputting K adjacent rPPG time space diagrams into the backbone network trained in the first stage to obtain K global high-dimensional characteristics F h Then global average pooling is carried out, and classification codes and one-dimensional leachable position codes are overlapped to be used as an input sequence Z of a transducer in
(4.2) constructing a feature fusion module of a plurality of rPPG time space diagrams based on a transducer: will input sequence Z in Performing multi-head self-attention operation MSA, passing through a feed forward network FFN, and after each operation is performed, further adjusting output by using layer normalization LN and residual connection to obtain output result Z of transducer out
5. The method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to claim 4, wherein the step (4.2) specifically comprises the following steps:
(4.2.1) input sequence Z in Generating a query matrix through a linear mapping layer
Figure FDA0004071098940000021
Key matrix->
Figure FDA0004071098940000022
And a Value matrix +.>
Figure FDA0004071098940000023
The three matrices are then transferred into a multi-headed self-attention mechanism MSA as shown in the following equation:
Figure FDA0004071098940000024
wherein d is a normalization constant, and T is matrix transposition operation;
(4.2.2) obtaining a feature fusion output Z after the conversion process through the FFN process of a feedforward network layer consisting of a multi-layer perceptron out
6. The method for implementing deep fake face identification based on rpg multi-scale space-time diagram and two-stage model according to claim 4, wherein the step (5) specifically comprises the following steps:
(5.1) the fused comprehensive characteristics Z obtained from the second stage training output out Global average pooling g (·) is performed, and then a fully connected network FC is used to map dimensions to category number 2 to obtain vector Z εR 2 As shown in the following formula:
Z=FC(g(Z out ))
(5.2) calculating Softmax for vector Z to obtain final prediction score y', and calculating two-category cross entropy loss L according to label y ce As shown in the following formula:
L ce =y log y′+(1-y)log(1-y′)
(5.3) construction of the Overall loss function L all As shown in the following formula:
L all =L ce +λL mask
where λ is the hyper-parameter used to balance cross entropy loss and mask loss.
7. A system for implementing deep counterfeited face authentication based on rpg multiscale space-time diagrams and two-stage models using the method of any one of claims 1 to 6, characterized in that the system comprises:
the rPPG multi-scale space-time diagram generation module is used for calculating an rPPG space-time diagram from the face video frame;
the mask-guided local attention module is connected with the rPPG multi-scale space-time diagram generation module and is used for enhancing the learning of local information and extracting the characteristics of a single rPPG space-time diagram;
the transducer module is connected with the local attention module guided by the mask and used for fusing the comprehensive characteristics of a plurality of adjacent rPPG time-space diagrams; and
the classification head module is connected with the transducer module and is used for pooling the integrated features after fusion and carrying out classification recognition processing so as to obtain the identification result of the target image and construct an overall loss function.
8. An apparatus for implementing rfpg multi-scale space-time diagram and two-stage model based deep counterfeited face authentication, the apparatus comprising:
a processor configured to execute computer-executable instructions;
a memory storing one or more computer-executable instructions which, when executed by the processor, perform the steps of the method for implementing deep counterfeited face authentication based on rpg multiscale space-time diagrams and two-phase models of any one of claims 1 to 6.
9. A processor for implementing rfpg multi-scale space-time diagram and two-stage model based deep counterfeited face authentication, characterized in that the processor is configured to execute computer executable instructions which, when executed by the processor, implement the respective steps of the method of implementing the rfpg multi-scale space-time diagram and two-stage model based deep counterfeited face authentication according to any one of claims 1 to 6.
10. A computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method of any one of claims 1 to 6 for implementing deep counterfeited face authentication based on rpg multiscale space-time diagrams and two-stage models.
CN202310093773.8A 2023-02-10 2023-02-10 Method, system, device, processor and computer readable storage medium for implementing deep fake face identification Pending CN116012958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310093773.8A CN116012958A (en) 2023-02-10 2023-02-10 Method, system, device, processor and computer readable storage medium for implementing deep fake face identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310093773.8A CN116012958A (en) 2023-02-10 2023-02-10 Method, system, device, processor and computer readable storage medium for implementing deep fake face identification

Publications (1)

Publication Number Publication Date
CN116012958A true CN116012958A (en) 2023-04-25

Family

ID=86037336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310093773.8A Pending CN116012958A (en) 2023-02-10 2023-02-10 Method, system, device, processor and computer readable storage medium for implementing deep fake face identification

Country Status (1)

Country Link
CN (1) CN116012958A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258914A (en) * 2023-05-15 2023-06-13 齐鲁工业大学(山东省科学院) Remote sensing image classification method based on machine learning and local and global feature fusion
CN116311482A (en) * 2023-05-23 2023-06-23 中国科学技术大学 Face fake detection method, system, equipment and storage medium
CN116385468A (en) * 2023-06-06 2023-07-04 浙江大学 System based on zebra fish heart parameter image analysis software generation
CN116486464A (en) * 2023-06-20 2023-07-25 齐鲁工业大学(山东省科学院) Attention mechanism-based face counterfeiting detection method for convolution countermeasure network

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258914A (en) * 2023-05-15 2023-06-13 齐鲁工业大学(山东省科学院) Remote sensing image classification method based on machine learning and local and global feature fusion
CN116258914B (en) * 2023-05-15 2023-08-25 齐鲁工业大学(山东省科学院) Remote Sensing Image Classification Method Based on Machine Learning and Local and Global Feature Fusion
CN116311482A (en) * 2023-05-23 2023-06-23 中国科学技术大学 Face fake detection method, system, equipment and storage medium
CN116311482B (en) * 2023-05-23 2023-08-29 中国科学技术大学 Face fake detection method, system, equipment and storage medium
CN116385468A (en) * 2023-06-06 2023-07-04 浙江大学 System based on zebra fish heart parameter image analysis software generation
CN116385468B (en) * 2023-06-06 2023-09-01 浙江大学 System based on zebra fish heart parameter image analysis software generation
CN116486464A (en) * 2023-06-20 2023-07-25 齐鲁工业大学(山东省科学院) Attention mechanism-based face counterfeiting detection method for convolution countermeasure network
CN116486464B (en) * 2023-06-20 2023-09-01 齐鲁工业大学(山东省科学院) Attention mechanism-based face counterfeiting detection method for convolution countermeasure network

Similar Documents

Publication Publication Date Title
Van Quang et al. CapsuleNet for micro-expression recognition
CN116012958A (en) Method, system, device, processor and computer readable storage medium for implementing deep fake face identification
Liu et al. Deep learning face attributes in the wild
CN110837570B (en) Method for unbiased classification of image data
Narayanan et al. Hybrid machine learning architecture for automated detection and grading of retinal images for diabetic retinopathy
CN111968124B (en) Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation
Chen et al. A pornographic images recognition model based on deep one-class classification with visual attention mechanism
Lajevardi et al. Facial expression recognition from image sequences using optimized feature selection
Littlewort et al. Fully automatic coding of basic expressions from video
Bhattacharyya et al. Recognizing gender from human facial regions using genetic algorithm
CN114937298A (en) Micro-expression recognition method based on feature decoupling
Dadwhal et al. Data-driven skin detection in cluttered search and rescue environments
Bachay et al. Hybrid Deep Learning Model Based on Autoencoder and CNN for Palmprint Authentication.
Aktürk et al. Classification of eye images by personal details with transfer learning algorithms
Takalkar et al. Improving micro-expression recognition accuracy using twofold feature extraction
Yatbaz et al. Deep learning based stress prediction from offline signatures
George et al. Multi-channel face presentation attack detection using deep learning
Iniyan et al. Wavelet transformation and vertical stacking based image classification applying machine learning
Neagoe et al. Subject independent drunkenness detection using pulse-coupled neural network segmentation of thermal infrared facial imagery
Kumar et al. Siamese based Neural Network for Offline Writer Identification on word level data
Iffath et al. A Novel Three Stage Framework for Person Identification From Audio Aesthetic
CN113343770A (en) Face anti-counterfeiting method based on feature screening
Wang et al. Audiovisual emotion recognition via cross-modal association in kernel space
Lu et al. Joint Subspace and Low‐Rank Coding Method for Makeup Face Recognition
Filisbino et al. Multi-class nonlinear discriminant feature analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination