CN117690061B - Depth fake video detection method, device, equipment and storage medium - Google Patents

Depth fake video detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN117690061B
CN117690061B CN202311827980.8A CN202311827980A CN117690061B CN 117690061 B CN117690061 B CN 117690061B CN 202311827980 A CN202311827980 A CN 202311827980A CN 117690061 B CN117690061 B CN 117690061B
Authority
CN
China
Prior art keywords
video
data
model
detected
preprocessed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311827980.8A
Other languages
Chinese (zh)
Other versions
CN117690061A (en
Inventor
朱威
黎健和
陈盛福
温世欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Post Consumer Finance Co ltd
Original Assignee
China Post Consumer Finance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Post Consumer Finance Co ltd filed Critical China Post Consumer Finance Co ltd
Priority to CN202311827980.8A priority Critical patent/CN117690061B/en
Publication of CN117690061A publication Critical patent/CN117690061A/en
Application granted granted Critical
Publication of CN117690061B publication Critical patent/CN117690061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention relates to the technical field of video detection, and discloses a method, a device, equipment and a storage medium for detecting depth counterfeit video, wherein the method comprises the following steps: performing face authentication on the video to be detected to obtain face actions; segmenting equipment monitoring data corresponding to the face action from the video to be detected; preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data; and detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a depth fake video or not based on a detection result, and combining and adjusting the depth neural network model and the generalized linear model to obtain the preset detection model. The method and the device detect the video to be detected based on the equipment monitoring data cut from the video to be detected and the preset detection model obtained after the combination adjustment of the deep neural network model and the generalized linear model, so that whether the video to be detected is a deep fake video can be accurately judged.

Description

Depth fake video detection method, device, equipment and storage medium
Technical Field
The present invention relates to the field of video detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a depth counterfeit video.
Background
Along with the rapid development of the internet, the financial industry gradually changes to online, and people can meet own financial requirements by opening bank accounts online. However, there is a concomitant increase in network security issues, particularly in the case of fraudsters using various means during some financial transactions. For example, in the process of handling financial services online, a fraudster can cheat a face authentication system by hijacking a camera of the device, so that the system does not start the camera, but obtains a video which is deeply forged.
The conventional detection method of the depth counterfeit video at present mainly depends on analysis of visual differences, but due to continuous progress of the depth counterfeit technology, the visual differences between the counterfeited video and the real video are smaller and smaller, so that the depth counterfeit video is difficult to accurately detect. Therefore, there is a need in the industry for a method that can accurately detect deep counterfeited video.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for detecting a depth counterfeit video, and aims to solve the technical problem that the prior art is difficult to accurately detect the depth counterfeit video.
In order to achieve the above object, the present invention provides a depth counterfeit video detection method, comprising the steps of:
performing face authentication on a video to be detected to obtain face actions, wherein the face actions comprise mouth opening actions, blinking actions, head shaking actions, head spotting actions and static actions;
Segmenting equipment monitoring data corresponding to the face action from the video to be detected, wherein the equipment monitoring data is a video segment for making a complete face action in the video to be detected;
preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data;
And detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a depth fake video or not based on a detection result, and combining and adjusting a depth neural network model and a generalized linear model to obtain the preset detection model.
Optionally, the step of performing face authentication on the video to be detected to obtain a face action includes:
acquiring the length-width ratio of a mouth of a face in a video to be detected, and judging whether a mouth opening action exists in the video to be detected or not based on the length-width ratio of the mouth;
Acquiring an eye length-width ratio of a face in a video to be detected, and judging whether blink motion exists in the video to be detected or not based on the eye length-width ratio;
And acquiring cheek width change and nose-to-chin distance change of a face in the video to be detected, and judging whether a head shaking action and/or a head nodding action exists in the video to be detected or not based on the cheek width change and the nose-to-chin distance change.
Optionally, the step of performing face authentication on the video to be detected to obtain a face action further includes:
If any one of the mouth opening action, the blink action, the head shaking action and the head nodding action does not exist in the video to be detected, and the human face movement amplitude in the video to be detected is smaller than a preset amplitude threshold, judging that a static action exists in the video to be detected.
Optionally, the preprocessed device monitoring data includes preprocessed time sequence data and preprocessed behavior data, and the step of preprocessing the device monitoring data to obtain preprocessed device monitoring data includes:
Performing frame extraction on time sequence data in the equipment monitoring data according to fixed frequency and fixed quantity, and performing maximum and minimum normalization processing on the time sequence data after frame extraction to obtain the preprocessed time sequence data;
And counting the average value and standard deviation of the behavior data in the equipment monitoring data, and carrying out standard normalization processing on the behavior data based on the average value and the standard deviation to obtain the preprocessed behavior data.
Optionally, after the step of preprocessing the device monitoring data to obtain preprocessed device monitoring data, the method further includes:
Selecting a positive sample and a negative sample from the equipment monitoring data, wherein the positive sample is the equipment monitoring data containing the face action, and the negative sample is the equipment monitoring data not containing the face action;
And calculating a loss function based on the positive sample and the negative sample, and adjusting an initial model based on the loss function to obtain a preset detection model, wherein the initial model is formed by combining a deep neural network model and a generalized linear model.
Optionally, the step of detecting the preprocessed device monitoring data through a preset detection model and judging whether the video to be detected is a depth counterfeit video based on a detection result includes:
inputting the preprocessed time sequence data and the preprocessed behavior data into a preset detection model to obtain a model output result;
And judging whether the video to be detected is a depth fake video or not based on the model output result.
Optionally, the preset detection model includes a one-dimensional convolution module, a deep neural network model and a generalized linear model, and the step of inputting the preprocessed time sequence data and the preprocessed behavior data into the preset detection model to obtain a model output result includes:
Inputting the preprocessed time sequence data into the one-dimensional convolution module, and outputting flattened processing data;
Inputting the flattened processing data into the deep neural network model to obtain a first model output value;
inputting the preprocessed behavior data into the deep neural network model to obtain a second model output value;
the flattened processing data and the preprocessed behavior data are spliced to obtain spliced data, and the spliced data are input into the generalized linear model to obtain a third model output value;
And adding the first model output value, the second model output value and the third model output value to obtain a model output result.
In addition, to achieve the above object, the present invention also proposes a depth-counterfeit video detection device, comprising:
The face authentication module is used for carrying out face authentication on the video to be detected to obtain face actions, wherein the face actions comprise mouth opening actions, blinking actions, head shaking actions, head nodding actions and static actions;
The data segmentation module is used for segmenting equipment monitoring data corresponding to the face actions from the video to be detected, wherein the equipment monitoring data is a video segment for making complete face actions in the video to be detected;
The data processing module is used for preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data;
the data detection module is used for detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a deep fake video or not based on a detection result, and obtaining the preset detection model through combining and adjusting a deep neural network model and a generalized linear model.
In addition, to achieve the above object, the present invention also proposes a depth counterfeit video detection device, comprising: a memory, a processor, and a depth-counterfeit video detection program stored on said memory and executable on said processor, said depth-counterfeit video detection program configured to implement the steps of the depth-counterfeit video detection method as described above.
In addition, to achieve the above object, the present invention also proposes a storage medium having stored thereon a depth-forgery-video detection program that, when executed by a processor, implements the steps of the depth-forgery-video detection method as described above.
The method comprises the steps of carrying out face authentication on a video to be detected to obtain face actions, wherein the face actions comprise mouth opening actions, blink actions, head shaking actions, head nodding actions and static actions; segmenting equipment monitoring data corresponding to the face action from the video to be detected, wherein the equipment monitoring data is a video segment for making a complete face action in the video to be detected; preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data; and detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a depth fake video or not based on a detection result, and combining and adjusting a depth neural network model and a generalized linear model to obtain the preset detection model. Compared with the traditional depth fake video detection method which mainly relies on analysis of visual difference, the method disclosed by the invention cuts equipment monitoring data corresponding to human face actions from the video to be detected, and detects the video to be detected based on the equipment monitoring data and the preset detection model obtained after combination adjustment of the depth neural network model and the generalized linear model, so that the technical defect that the prior art is too depended on visual difference analysis is avoided, and whether the video to be detected is the depth fake video or not can be accurately judged.
Drawings
Fig. 1 is a schematic structural diagram of a depth counterfeit video detection device in a hardware operation environment according to an embodiment of the present invention;
FIG. 2 is a flowchart of a first embodiment of a method for detecting a deep forgery video according to the present invention;
FIG. 3 is a flowchart of a second embodiment of the method for detecting a deep forgery video according to the present invention;
FIG. 4 is a flowchart of a third embodiment of a method for detecting a deep forgery video according to the present invention;
FIG. 5 is a schematic diagram of a first process for obtaining a model output result in the method for detecting a deep forgery video according to the present invention;
FIG. 6 is a schematic diagram of a second process for obtaining a model output result in the method for detecting a deep forgery video according to the present invention;
fig. 7 is a block diagram showing the construction of a first embodiment of the depth counterfeit video detection device of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a depth counterfeit video detection device in a hardware operation environment according to an embodiment of the present invention.
As shown in fig. 1, the depth-forgery video detection apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the depth counterfeit video detection device, and may include more or fewer components than shown, or may combine certain components, or may have a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a deep forgery video detection program may be included in the memory 1005 as one type of storage medium.
In the deep forgery video detection apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the depth-forgery-video detection apparatus of the present invention may be provided in the depth-forgery-video detection apparatus, which invokes the depth-forgery-video detection program stored in the memory 1005 through the processor 1001 and executes the depth-forgery-video detection method provided by the embodiment of the present invention.
An embodiment of the present invention provides a method for detecting a depth counterfeit video, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the method for detecting a depth counterfeit video.
In this embodiment, the method for detecting the depth counterfeit video includes the following steps:
step S10: performing face authentication on the video to be detected to obtain face actions, wherein the face actions comprise mouth opening actions, blinking actions, head shaking actions, head pointing actions and static actions.
It should be noted that, the execution body of the method of the embodiment may be a terminal device having functions of face authentication, data processing and program running, such as a smart phone, a smart watch, etc., or may be an electronic device having the same or similar functions, such as the above-mentioned deep forgery video detection device. Hereinafter, this embodiment and the following embodiments will be described with reference to a depth counterfeit video detection device (hereinafter referred to as detection device).
It can be understood that the video to be detected may be a video recorded by a current user holding a video recording device and including a face motion, where the face motion may be other motion that can reflect a face state, such as a mouth opening motion, a blink motion, a head shaking motion, a head pointing motion, and a static motion.
In a specific implementation, the face authentication may be performed on the video to be detected through a conventional face authentication model, for example, EIGENFACES (feature face algorithm) model, LBPH (Local Binary Patterns Histogram, local binary pattern histogram) model, etc., which is not limited in this embodiment.
Step S20: and cutting out equipment monitoring data corresponding to the face action from the video to be detected, wherein the equipment monitoring data is a video segment for making a complete face action in the video to be detected.
It should be noted that, the device monitoring data and the face actions are in one-to-one correspondence. Illustratively, the face action a corresponds to the device monitoring data a, the face action B corresponds to the device monitoring data B, and the face action C corresponds to the device monitoring data C.
Step S30: and preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data.
In a specific implementation, because the sensor data in the device monitoring data has a time sequence relationship, the data volume of the device monitoring data can be huge and the required model reasoning time is short. Therefore, the frame extraction frequency can be fixed through the preprocessing, so that the equipment monitoring data is compressed, and a large amount of time and required memory for model training reasoning are saved. Meanwhile, the data enhancement can be performed through the preprocessing, namely, different initial frames are selected for multiple times to start to sequentially extract frames and noise data is added, so that the generalization capability of the model is enhanced. In addition, the pre-processing can also be used for carrying out standardized processing on behavior data (registration and login modes, buried point data and the like) and compressed sensor data in the equipment monitoring data.
Step S40: and detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a depth fake video or not based on a detection result, and combining and adjusting a depth neural network model and a generalized linear model to obtain the preset detection model.
It should be noted that the generalized linear model may be used to capture interactions between features, and the deep neural network model may be used to learn more complex feature representations, for modeling and predictive classification of device monitoring data.
In a specific implementation, multiple face actions in one video and whether a plurality of models make corresponding face actions or not can be integrated, and whether the video to be detected is a deep fake video injected by hijacking a mobile phone camera or not can be analyzed.
The embodiment obtains face actions by carrying out face authentication on the video to be detected, wherein the face actions comprise mouth opening actions, blink actions, head shaking actions, head nodding actions and static actions; segmenting equipment monitoring data corresponding to the face action from the video to be detected, wherein the equipment monitoring data is a video segment for making a complete face action in the video to be detected; preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data; and detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a depth fake video or not based on a detection result, and combining and adjusting a depth neural network model and a generalized linear model to obtain the preset detection model. Compared with the traditional depth fake video detection method which mainly relies on analysis of visual difference, the method disclosed by the embodiment cuts equipment monitoring data corresponding to human face actions from the video to be detected, detects the video to be detected based on the equipment monitoring data and the preset detection model obtained after combination adjustment of the depth neural network model and the generalized linear model, and therefore the technical defect that the prior art is too dependent on visual difference analysis is avoided, and whether the video to be detected is the depth fake video or not can be accurately judged.
Referring to fig. 3, fig. 3 is a flowchart of a second embodiment of the depth counterfeit video detection method according to the present invention.
Based on the above-mentioned first embodiment, in this embodiment, in order to more accurately acquire the face motion in the video to be detected, the step S10 may include:
Step S101: and acquiring the length-width ratio of a mouth of a face in the video to be detected, and judging whether a mouth opening action exists in the video to be detected or not based on the length-width ratio of the mouth.
Step S102: and acquiring the eye length-width ratio of a face in the video to be detected, and judging whether blink motion exists in the video to be detected or not based on the eye length-width ratio.
Step S103: and acquiring cheek width change and nose-to-chin distance change of a face in the video to be detected, and judging whether a head shaking action and/or a head nodding action exists in the video to be detected or not based on the cheek width change and the nose-to-chin distance change.
In a specific implementation, whether a mouth opening action exists in the video to be detected can be judged by calculating the aspect ratio (MAR, mouth Aspect Ratio) of the mouth. When the value of MAR is greater than the set threshold, the mouth is considered to have opened. And calculating the length-width ratio of the mouth, and monitoring the state of the mouth in real time so as to judge whether to open the mouth. Whether a blink motion exists in the video to be detected can be determined by calculating an Aspect Ratio (EAR) of the Eye. When the eye is open, the EAR fluctuates up and down at a certain value, whereas when the eye is closed, the EAR drops rapidly, theoretically approaching zero. By monitoring the change of EAR, whether eyes blink or not is judged. The shaking head detection is to judge whether to nod or shake the head by calculating the width change of the cheeks at the left side and the right side and the distance from the nose to the chin. When the width of the cheeks at the left and right sides is greatly changed and the distance from the nose to the chin is also obviously changed, the head shaking action is judged. And detecting the change of the facial features to judge the action of the face in real time.
Step S104: if any one of the mouth opening action, the blink action, the head shaking action and the head nodding action does not exist in the video to be detected, and the human face movement amplitude in the video to be detected is smaller than a preset amplitude threshold, judging that a static action exists in the video to be detected.
Further, based on the first embodiment, the preprocessed device monitoring data includes preprocessed time series data and preprocessed behavior data, and the step S30 may include:
Step S301: and performing frame extraction on the time sequence data in the equipment monitoring data according to the fixed frequency and the fixed quantity, and performing maximum and minimum normalization processing on the time sequence data after frame extraction to obtain the preprocessed time sequence data.
In a specific implementation, the maximum and minimum normalization processing can be performed on the time sequence data after frame extraction based on the following formula:
Wherein x is the time sequence data after frame extraction, x min is the minimum value of the time sequence data after frame extraction in the time dimension, x max is the maximum value of the time sequence data after frame extraction in the time dimension, and x' is the time sequence data after preprocessing.
Step S302: and counting the average value and standard deviation of the behavior data in the equipment monitoring data, and carrying out standard normalization processing on the behavior data based on the average value and the standard deviation to obtain the preprocessed behavior data.
In a specific implementation, the above behavior data may be normalized based on the following formula:
Wherein x mean is the average value of the behavior data in the device monitoring data, x std is the standard deviation of the behavior data in the device monitoring data, and x * is the preprocessed behavior data.
According to the embodiment, whether the mouth opening action exists in the video to be detected or not is judged by acquiring the mouth length-width ratio of the face in the video to be detected and based on the mouth length-width ratio; acquiring an eye length-width ratio of a face in a video to be detected, and judging whether blink motion exists in the video to be detected or not based on the eye length-width ratio; acquiring cheek width change and nose-to-chin distance change of a face in a video to be detected, and judging whether a head shaking action and/or a head nodding action exists in the video to be detected or not based on the cheek width change and the nose-to-chin distance change; if any one of the mouth opening action, the blink action, the head shaking action and the head nodding action does not exist in the video to be detected, and the human face movement amplitude in the video to be detected is smaller than a preset amplitude threshold, judging that a static action exists in the video to be detected; performing frame extraction on time sequence data in the equipment monitoring data according to fixed frequency and fixed quantity, and performing maximum and minimum normalization processing on the time sequence data after frame extraction to obtain the preprocessed time sequence data; and counting the average value and standard deviation of the behavior data in the equipment monitoring data, and carrying out standard normalization processing on the behavior data based on the average value and the standard deviation to obtain the preprocessed behavior data. Compared with the traditional depth forging video detection method, the method of the embodiment carries out face recognition on the video to be detected based on the mouth length-width ratio, the eye length-width ratio, the cheek width change, the nose-to-chin distance change and the face motion amplitude of the face in the video to be detected, so that the accuracy of face recognition is improved.
Referring to fig. 4, fig. 4 is a flowchart of a third embodiment of the depth counterfeit video detection method according to the present invention.
Based on the above embodiments, in the present embodiment, after the step S30, it may further include:
step S31: and selecting a positive sample and a negative sample from the equipment monitoring data, wherein the positive sample is the equipment monitoring data containing the face action, and the negative sample is the equipment monitoring data not containing the face action.
Step S32: and calculating a loss function based on the positive sample and the negative sample, and adjusting an initial model based on the loss function to obtain a preset detection model, wherein the initial model is formed by combining a deep neural network model and a generalized linear model.
In a specific implementation, the above-described loss function may be calculated based on the following formula:
Wherein y i represents the label of sample i, the positive sample is 1, and the negative sample is 0; representing the probability that model predictive sample i is a positive sample,/> Representing the probability that the model predictive sample i is a negative sample. N represents the number of samples, L i represents the loss value of the model prediction sample i, and L represents the loss value of the model prediction.
Further, in this embodiment, the preset detection model includes a one-dimensional convolution module, a deep neural network model, and a generalized linear model, and the step S40 may include:
Step S401: and inputting the preprocessed time sequence data and the preprocessed behavior data into a preset detection model to obtain a model output result.
Step S402: and judging whether the video to be detected is a depth fake video or not based on the model output result.
In a specific implementation, the video to be detected may be predicted and scored based on the output result of the model, and when the score exceeds a preset score, the video to be detected may be determined to be a depth counterfeit video.
Further, in this embodiment, the step S402 may include:
Step S4021: and inputting the preprocessed time sequence data into the one-dimensional convolution module, and outputting flattened processing data.
Step S4022: and inputting the flattened processing data into the deep neural network model to obtain a first model output value.
Step S4023: and inputting the preprocessed behavior data into the deep neural network model to obtain a second model output value.
Step S4024: and inputting spliced data obtained by splicing the flattened processing data and the preprocessed behavior data into the generalized linear model to obtain a third model output value.
Step S4025: and adding the first model output value, the second model output value and the third model output value to obtain a model output result.
In a specific implementation, reference may be made to fig. 5, and fig. 5 is a schematic diagram of a first acquisition flow of a model output result in the depth counterfeit video detection method of the present invention. In fig. 5, the preprocessed device monitoring data may be divided into time series data and behavior data. And inputting the time sequence data into Encoder (namely the one-dimensional convolution module) to obtain flattened data, and splicing the flattened data with the behavior data to obtain spliced data. At this time, the data and behavior data output after the flattening process may be input to the Predictor model (i.e., the above-described deep neural network model) and the first model output value and the second model output value may be output, respectively, and the spliced data may be input to the Predictor model (i.e., the above-described generalized linear model) and the third model output value may be output. Finally, the first model output value, the second model output value, and the third model output value may be added to obtain an added result (i.e., the model output result described above).
In the embodiment, a positive sample and a negative sample are selected from the device monitoring data, wherein the positive sample is the device monitoring data containing the face action, and the negative sample is the device monitoring data not containing the face action; calculating a loss function based on the positive sample and the negative sample, and adjusting an initial model based on the loss function to obtain a preset detection model, wherein the initial model is formed by combining a deep neural network model and a generalized linear model; inputting the preprocessed time sequence data and the preprocessed behavior data into a preset detection model to obtain a model output result; inputting the preprocessed time sequence data into the one-dimensional convolution module, and outputting flattened processing data; inputting the flattened processing data into the deep neural network model to obtain a first model output value; inputting the preprocessed behavior data into the deep neural network model to obtain a second model output value; the flattened processing data and the preprocessed behavior data are spliced to obtain spliced data, and the spliced data are input into the generalized linear model to obtain a third model output value; and adding the first model output value, the second model output value and the third model output value to obtain a model output result. Compared with the traditional depth counterfeit video detection method, the method disclosed by the embodiment inputs the preprocessed time sequence data and the preprocessed behavior data into the preset detection model comprising the one-dimensional convolution module, the depth neural network model and the generalized linear model, so that a more comprehensive model output result can be obtained, and the detection accuracy of the depth counterfeit video can be further improved based on the model output result.
Furthermore, in another embodiment, detection of depth counterfeit video may also be achieved based on the following.
And the first part is used for dividing the equipment monitoring data corresponding to each action. In the face authentication process of applying for loan, the applicant is identified to do various actions such as mouth opening, blinking, head shaking, static face actions and the like through the existing face authentication model, and the actions such as forward, backward, upward, downward and the like of the hand actions of the mobile device when the corrected face appears in the camera of the device are indirectly identified. And cutting out equipment monitoring data corresponding to each action, wherein the equipment monitoring data comprises mobile phone sensor data and behavior data of operation equipment. The label corresponding to the equipment monitoring data is whether a corresponding face action or a corresponding hand movement action is made (one action corresponds to one detection model). The hand forward and backward movement detection is to calculate the change of the ratio of the face area to the whole video, wherein the ratio of the face area to the whole video continuously changes from small to large to move forward, and from large to small to move backward. The detection of the forward and backward movement of the hand is to calculate the distance between the nose and the uppermost end of the video, wherein the distance is continuously longer from short to long, the hand moves downwards, and the distance is continuously shorter from long to short, the hand moves upwards.
And a second part, data preprocessing. The sensor data in the equipment monitoring data is embedding in time sequence, the data volume is huge, the required model reasoning time is short, the equipment monitoring data is compressed through fixing the frame pumping frequency, and the time and the required memory for training and reasoning a large number of models are saved. The data enhancement mode is to select different initial frames for starting sequential frame extraction and adding noise data for a plurality of times, which is helpful for enhancing the generalization capability of the model. In addition, behavior data (registration method, buried point data, etc.) and compressed sensor data in the device monitoring data are subjected to normalization processing.
And thirdly, constructing a model. The model of the present embodiment is mainly composed of a processing time series data model and a processing behavior data model. The processing time sequence data model is an improved Wide & Deep model (namely, the Deep layer in the Wide & Deep model is replaced by a cyclic neural network suitable for time sequence data, self-attention is added on DeepFM to strengthen interaction relation among model capturing characteristics), the Deep layer firstly adopts a one-dimensional convolution module (Encoder) to conduct sliding window operation on equipment monitoring data in a time dimension, and therefore local time sequence characteristics are extracted and used as input of the cyclic neural network, and Deep characteristics of the time sequence data are fully mined. In addition, the Wide layer extracts the statistical characteristics of the time sequence data, and the transformed result is added with the result of the Wide layer. While the process behavior data model adds self-attention on the basis of DeepFM to better capture the interaction relationships between features. The whole model combines the results of the time sequence data model and the behavior data model so as to conduct prediction classification of the equipment monitoring data. Referring to fig. 6, fig. 6 is a schematic diagram of a second acquisition flow of a model output result in the depth counterfeit video detection method of the present invention. In fig. 6, a one-dimensional convolution module (Encoder) is adopted in the model 1 to perform sliding window operation on time sequence data of equipment monitoring data in a time dimension, the length of a convolution kernel is 2.5 seconds of frame number, the step length is 1 second of frame number, the number of channels generated by convolution is consistent with the number of channels input, finally 2-dimensional data is output as the input of a cyclic neural network LSTM in the time sequence data model, and the statistical characteristics in the time sequence data are average, standard deviation, maximum, minimum, extreme value, quartile, absolute average deviation and median absolute average deviation of each characteristic attribute, and the statistical characteristics are flattened into 1-dimensional data, namely the process of extracting statistical characteristics of the model 1. The input of the model 2 is one-dimensional sparse behavior data, the model is compressed after DeepFM modules, and the self-attention mechanism and the linear transformation of self-attention are used for obtaining a prediction result of the behavior data. The predicted result of the model is the normalized result of the model 1 plus the normalized result of the model 2.
And fourth, analyzing results. And combining a plurality of facial actions, hand actions and a plurality of models in one video, judging whether corresponding actions are made according to equipment monitoring data, and analyzing whether the video is a depth fake video which is injected by hijacking a mobile phone camera.
In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a depth fake video detection program, and the depth fake video detection program realizes the steps of the depth fake video detection method when being executed by a processor.
Referring to fig. 7, fig. 7 is a block diagram showing the construction of a first embodiment of a depth falsification video detecting apparatus according to the present invention.
As shown in fig. 7, the depth falsification video detecting apparatus according to the embodiment of the present invention includes:
The face authentication module 701 is configured to perform face authentication on a video to be detected to obtain a face action, where the face action includes a mouth opening action, a blink action, a head shaking action, a head nodding action, and a static action;
The data slicing module 702 is configured to slice, from the video to be detected, device monitoring data corresponding to the face action, where the device monitoring data is a video segment of the video to be detected that performs a complete face action;
the data processing module 703 is configured to preprocess the device monitoring data to obtain preprocessed device monitoring data;
the data detection module 704 is configured to detect the preprocessed device monitoring data through a preset detection model, and determine whether the video to be detected is a deep fake video based on a detection result, where the preset detection model is obtained by combining and adjusting a deep neural network model and a generalized linear model.
The embodiment obtains face actions by carrying out face authentication on the video to be detected, wherein the face actions comprise mouth opening actions, blink actions, head shaking actions, head nodding actions and static actions; segmenting equipment monitoring data corresponding to the face action from the video to be detected, wherein the equipment monitoring data is a video segment for making a complete face action in the video to be detected; preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data; and detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a depth fake video or not based on a detection result, and combining and adjusting a depth neural network model and a generalized linear model to obtain the preset detection model. Compared with the traditional depth fake video detection method which mainly relies on analysis of visual difference, the method disclosed by the embodiment cuts equipment monitoring data corresponding to human face actions from the video to be detected, detects the video to be detected based on the equipment monitoring data and the preset detection model obtained after combination adjustment of the depth neural network model and the generalized linear model, and therefore the technical defect that the prior art is too dependent on visual difference analysis is avoided, and whether the video to be detected is the depth fake video or not can be accurately judged.
Based on the first embodiment of the above-mentioned depth-counterfeit video detection device of the present invention, a second embodiment of the depth-counterfeit video detection device of the present invention is proposed.
In this embodiment, the face authentication module 701 is further configured to obtain a mouth length-width ratio of a face in a video to be detected, and determine whether a mouth opening action exists in the video to be detected based on the mouth length-width ratio; acquiring an eye length-width ratio of a face in a video to be detected, and judging whether blink motion exists in the video to be detected or not based on the eye length-width ratio; and acquiring cheek width change and nose-to-chin distance change of a face in the video to be detected, and judging whether a head shaking action and/or a head nodding action exists in the video to be detected or not based on the cheek width change and the nose-to-chin distance change.
Further, the face authentication module 701 is further configured to determine that a still motion exists in the video to be detected if any one of the mouth opening motion, the blink motion, the head shaking motion and the head nodding motion does not exist in the video to be detected, and the face motion amplitude in the video to be detected is smaller than a preset amplitude threshold.
Further, the preprocessed device monitoring data includes preprocessed time sequence data and preprocessed behavior data, and the data processing module 703 is further configured to perform frame extraction on the time sequence data in the device monitoring data according to a fixed frequency and a fixed number, and perform maximum and minimum normalization processing on the time sequence data after frame extraction to obtain the preprocessed time sequence data; and counting the average value and standard deviation of the behavior data in the equipment monitoring data, and carrying out standard normalization processing on the behavior data based on the average value and the standard deviation to obtain the preprocessed behavior data.
Further, the data processing module 703 is further configured to select a positive sample and a negative sample from the device monitoring data, where the positive sample is the device monitoring data including the face motion, and the negative sample is the device monitoring data not including the face motion; and calculating a loss function based on the positive sample and the negative sample, and adjusting an initial model based on the loss function to obtain a preset detection model, wherein the initial model is formed by combining a deep neural network model and a generalized linear model.
Further, the data detection module 704 is further configured to input the preprocessed time-series data and the preprocessed behavior data into a preset detection model, so as to obtain a model output result; and judging whether the video to be detected is a depth fake video or not based on the model output result.
Further, the preset detection model includes a one-dimensional convolution module, a deep neural network model, and a generalized linear model, and the data detection module 704 is further configured to input the preprocessed time-series data into the one-dimensional convolution module, and output flattened processing data; inputting the flattened processing data into the deep neural network model to obtain a first model output value; inputting the preprocessed behavior data into the deep neural network model to obtain a second model output value; the flattened processing data and the preprocessed behavior data are spliced to obtain spliced data, and the spliced data are input into the generalized linear model to obtain a third model output value; and adding the first model output value, the second model output value and the third model output value to obtain a model output result.
Other embodiments or specific implementation manners of the depth counterfeit video detection device of the present invention may refer to the above method embodiments, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (7)

1. A method for detecting depth counterfeit video, said method comprising the steps of:
performing face authentication on a video to be detected to obtain face actions, wherein the face actions comprise mouth opening actions, blinking actions, head shaking actions, head spotting actions and static actions;
Segmenting equipment monitoring data corresponding to the face action from the video to be detected, wherein the equipment monitoring data is a video segment for making a complete face action in the video to be detected;
preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data;
Detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a depth fake video or not based on a detection result, and combining and adjusting a depth neural network model and a generalized linear model to obtain the preset detection model;
The step of preprocessing the device monitoring data to obtain preprocessed device monitoring data comprises the following steps:
Performing frame extraction on time sequence data in the equipment monitoring data according to fixed frequency and fixed quantity, and performing maximum and minimum normalization processing on the time sequence data after frame extraction to obtain the preprocessed time sequence data;
Counting the average value and standard deviation of the behavior data in the equipment monitoring data, and carrying out standard normalization processing on the behavior data based on the average value and the standard deviation to obtain the preprocessed behavior data;
the step of detecting the preprocessed device monitoring data through a preset detection model and judging whether the video to be detected is a depth counterfeit video or not based on a detection result comprises the following steps:
inputting the preprocessed time sequence data and the preprocessed behavior data into a preset detection model to obtain a model output result;
Judging whether the video to be detected is a depth fake video or not based on the model output result;
the step of inputting the preprocessed time sequence data and the preprocessed behavior data into a preset detection model to obtain a model output result comprises the following steps:
Inputting the preprocessed time sequence data into the one-dimensional convolution module, and outputting flattened processing data;
Inputting the flattened processing data into the deep neural network model to obtain a first model output value;
inputting the preprocessed behavior data into the deep neural network model to obtain a second model output value;
the flattened processing data and the preprocessed behavior data are spliced to obtain spliced data, and the spliced data are input into the generalized linear model to obtain a third model output value;
And adding the first model output value, the second model output value and the third model output value to obtain a model output result.
2. The method for detecting deep forgery video according to claim 1, wherein the step of performing face authentication on the video to be detected to obtain a face action includes:
acquiring the length-width ratio of a mouth of a face in a video to be detected, and judging whether a mouth opening action exists in the video to be detected or not based on the length-width ratio of the mouth;
Acquiring an eye length-width ratio of a face in a video to be detected, and judging whether blink motion exists in the video to be detected or not based on the eye length-width ratio;
And acquiring cheek width change and nose-to-chin distance change of a face in the video to be detected, and judging whether a head shaking action and/or a head nodding action exists in the video to be detected or not based on the cheek width change and the nose-to-chin distance change.
3. The method for detecting deep forgery video according to claim 2, wherein the step of performing face authentication on the video to be detected to obtain a face action further comprises:
If any one of the mouth opening action, the blink action, the head shaking action and the head nodding action does not exist in the video to be detected, and the human face movement amplitude in the video to be detected is smaller than a preset amplitude threshold, judging that a static action exists in the video to be detected.
4. The depth counterfeit video detection method of claim 1, wherein said step of preprocessing said device monitoring data to obtain preprocessed device monitoring data further comprises:
Selecting a positive sample and a negative sample from the equipment monitoring data, wherein the positive sample is the equipment monitoring data containing the face action, and the negative sample is the equipment monitoring data not containing the face action;
And calculating a loss function based on the positive sample and the negative sample, and adjusting an initial model based on the loss function to obtain a preset detection model, wherein the initial model is formed by combining a deep neural network model and a generalized linear model.
5. A depth-counterfeit video detection device, said depth-counterfeit video detection device comprising:
The face authentication module is used for carrying out face authentication on the video to be detected to obtain face actions, wherein the face actions comprise mouth opening actions, blinking actions, head shaking actions, head nodding actions and static actions;
The data segmentation module is used for segmenting equipment monitoring data corresponding to the face actions from the video to be detected, wherein the equipment monitoring data is a video segment for making complete face actions in the video to be detected;
The data processing module is used for preprocessing the equipment monitoring data to obtain preprocessed equipment monitoring data;
The data detection module is used for detecting the preprocessed equipment monitoring data through a preset detection model, judging whether the video to be detected is a deep fake video or not based on a detection result, and obtaining the preset detection model through combining and adjusting a deep neural network model and a generalized linear model;
The step of preprocessing the device monitoring data to obtain preprocessed device monitoring data comprises the following steps:
Performing frame extraction on time sequence data in the equipment monitoring data according to fixed frequency and fixed quantity, and performing maximum and minimum normalization processing on the time sequence data after frame extraction to obtain the preprocessed time sequence data;
Counting the average value and standard deviation of the behavior data in the equipment monitoring data, and carrying out standard normalization processing on the behavior data based on the average value and the standard deviation to obtain the preprocessed behavior data;
the step of detecting the preprocessed device monitoring data through a preset detection model and judging whether the video to be detected is a depth counterfeit video or not based on a detection result comprises the following steps:
inputting the preprocessed time sequence data and the preprocessed behavior data into a preset detection model to obtain a model output result;
Judging whether the video to be detected is a depth fake video or not based on the model output result;
the step of inputting the preprocessed time sequence data and the preprocessed behavior data into a preset detection model to obtain a model output result comprises the following steps:
Inputting the preprocessed time sequence data into the one-dimensional convolution module, and outputting flattened processing data;
Inputting the flattened processing data into the deep neural network model to obtain a first model output value;
inputting the preprocessed behavior data into the deep neural network model to obtain a second model output value;
the flattened processing data and the preprocessed behavior data are spliced to obtain spliced data, and the spliced data are input into the generalized linear model to obtain a third model output value;
And adding the first model output value, the second model output value and the third model output value to obtain a model output result.
6. A depth counterfeit video detection device, said device comprising: a memory, a processor and a depth-counterfeit video detection program stored on said memory and executable on said processor, said depth-counterfeit video detection program being configured to implement the steps of the depth-counterfeit video detection method of any of claims 1 to 4.
7. A storage medium having stored thereon a depth-counterfeit video detection program which, when executed by a processor, implements the steps of the depth-counterfeit video detection method of any of claims 1 to 4.
CN202311827980.8A 2023-12-27 2023-12-27 Depth fake video detection method, device, equipment and storage medium Active CN117690061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311827980.8A CN117690061B (en) 2023-12-27 2023-12-27 Depth fake video detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311827980.8A CN117690061B (en) 2023-12-27 2023-12-27 Depth fake video detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117690061A CN117690061A (en) 2024-03-12
CN117690061B true CN117690061B (en) 2024-05-17

Family

ID=90131929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311827980.8A Active CN117690061B (en) 2023-12-27 2023-12-27 Depth fake video detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117690061B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818915A (en) * 2021-02-25 2021-05-18 华南理工大学 Depth counterfeit video detection method and system based on 3DMM soft biological characteristics
CN113627256A (en) * 2021-07-09 2021-11-09 武汉大学 Method and system for detecting counterfeit video based on blink synchronization and binocular movement detection
CN114078119A (en) * 2021-11-18 2022-02-22 厦门市美亚柏科信息股份有限公司 Depth-forged video detection method and system based on optical flow method
CN114550268A (en) * 2022-03-01 2022-05-27 北京赛思信安技术股份有限公司 Depth-forged video detection method utilizing space-time characteristics
CN114627412A (en) * 2022-03-07 2022-06-14 公安部第三研究所 Method, device and processor for realizing unsupervised depth forgery video detection processing based on error reconstruction and computer storage medium thereof
CN114898269A (en) * 2022-05-20 2022-08-12 公安部第三研究所 System, method, device, processor and storage medium for realizing deep forgery fusion detection based on eye features and face features
CN115273186A (en) * 2022-07-18 2022-11-01 中国人民警察大学 Depth-forged face video detection method and system based on image feature fusion
CN116994175A (en) * 2023-07-14 2023-11-03 中国科学院软件研究所 Space-time combination detection method, device and equipment for depth fake video
CN117197857A (en) * 2023-05-04 2023-12-08 支付宝(杭州)信息技术有限公司 Face counterfeiting attack detection and face recognition method, device and equipment
CN117275064A (en) * 2023-09-19 2023-12-22 中国科学院计算技术研究所 Face video depth forging detection method and device based on face time sequence information

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818915A (en) * 2021-02-25 2021-05-18 华南理工大学 Depth counterfeit video detection method and system based on 3DMM soft biological characteristics
CN113627256A (en) * 2021-07-09 2021-11-09 武汉大学 Method and system for detecting counterfeit video based on blink synchronization and binocular movement detection
CN114078119A (en) * 2021-11-18 2022-02-22 厦门市美亚柏科信息股份有限公司 Depth-forged video detection method and system based on optical flow method
CN114550268A (en) * 2022-03-01 2022-05-27 北京赛思信安技术股份有限公司 Depth-forged video detection method utilizing space-time characteristics
CN114627412A (en) * 2022-03-07 2022-06-14 公安部第三研究所 Method, device and processor for realizing unsupervised depth forgery video detection processing based on error reconstruction and computer storage medium thereof
CN114898269A (en) * 2022-05-20 2022-08-12 公安部第三研究所 System, method, device, processor and storage medium for realizing deep forgery fusion detection based on eye features and face features
CN115273186A (en) * 2022-07-18 2022-11-01 中国人民警察大学 Depth-forged face video detection method and system based on image feature fusion
CN117197857A (en) * 2023-05-04 2023-12-08 支付宝(杭州)信息技术有限公司 Face counterfeiting attack detection and face recognition method, device and equipment
CN116994175A (en) * 2023-07-14 2023-11-03 中国科学院软件研究所 Space-time combination detection method, device and equipment for depth fake video
CN117275064A (en) * 2023-09-19 2023-12-22 中国科学院计算技术研究所 Face video depth forging detection method and device based on face time sequence information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种高图像质量的虚拟视点绘制方法及 GPU 加速;陈璐瑶 等;《小型微型计算机系统》;20201030;第41卷(第10期);第2212-2218页 *
基于重构误差的无监督人脸伪造视频检测;许喆 等;《计算机应用》;20230510;第43卷(第5期);第1571-1577页 *
融合全局时序和局部空间特征的伪造人脸视频检测方法;陈鹏 等;《信息安全学报》;20200331;第5卷(第2期);第73-83页 *

Also Published As

Publication number Publication date
CN117690061A (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN108717663B (en) Facial tag fraud judging method, device, equipment and medium based on micro expression
CN109858375B (en) Living body face detection method, terminal and computer readable storage medium
CN112784763B (en) Expression recognition method and system based on local and overall feature adaptive fusion
TW202004637A (en) Risk prediction method and apparatus, storage medium, and server
JP7454105B2 (en) Facial image quality evaluation method and device, computer equipment and computer program
CN109815797B (en) Living body detection method and apparatus
JP7412496B2 (en) Living body (liveness) detection verification method, living body detection verification system, recording medium, and training method for living body detection verification system
WO2021088640A1 (en) Facial recognition technology based on heuristic gaussian cloud transformation
CN110472693B (en) Image processing and classifying method and system
US9378406B2 (en) System for estimating gender from fingerprints
CN115186303B (en) Financial signature safety management method and system based on big data cloud platform
CN110795714A (en) Identity authentication method and device, computer equipment and storage medium
CN115376559A (en) Emotion recognition method, device and equipment based on audio and video
CN112307937A (en) Deep learning-based identity card quality inspection method and system
Garhawal et al. A study on handwritten signature verification approaches
CN117690061B (en) Depth fake video detection method, device, equipment and storage medium
CN113591603A (en) Certificate verification method and device, electronic equipment and storage medium
CN114373213A (en) Juvenile identity recognition method and device based on face recognition
CN113657498A (en) Biological feature extraction method, training method, authentication method, device and equipment
Jalal et al. Facial Mole Detection Approach for Suspect Face Identification using ResNeXt-50
Majidpour et al. Unreadable offline handwriting signature verification based on generative adversarial network using lightweight deep learning architectures
CN114760484B (en) Live video identification method, live video identification device, computer equipment and storage medium
CN111738012B (en) Method, device, computer equipment and storage medium for extracting semantic alignment features
CN115953819A (en) Training method, device and equipment of face recognition model and storage medium
Chetty et al. Multimodal feature fusion for video forgery detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant