CN116645710A

CN116645710A - Depth fake video detection method, device and storage medium

Info

Publication number: CN116645710A
Application number: CN202211206017.3A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Beijing Real AI Technology Co Ltd
Current assignee: Beijing Real AI Technology Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-08-25

Abstract

The embodiment of the application relates to the technical field of image processing, and provides a method and a device for detecting a depth counterfeit video and a storage medium, wherein the method comprises the following steps: a first facial action of identifying a target face in at least one first face image in a first video to be detected from at least one video source Fang Huoqu; determining a first detection mode from a plurality of preset detection modes based on the first facial action; and calling a first detection mode to identify the true or false of the target face in the video to be detected, and outputting a first detection result. According to the scheme, when the fake video of different fake modes is detected, the fake video detection mode can be switched to the matched fake video detection mode at will, particularly when the network video with various fake mode changes is detected, the fake detection of the matched current video to be detected can be switched to dynamically, and at least one network video adopting at least one fake mode can be detected. And the flexibility of detecting is high, is applicable to diversified videos, and the test coverage is wide, can effectively avoid missing to survey.

Description

Depth fake video detection method, device and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a method and a device for detecting depth fake video and a storage medium.

Background

The facial prosthesis is easily manufactured based on the black-birth tool, and bypasses the facial recognition system by wearing the facial prosthesis. Currently, network videos have a plurality of N times of reconstruction, and the N times of reconstruction network videos are usually synthesized by adopting modes of PS, deep forgery and the like, such as AI face changing. At present, the following problems mainly exist:

1. because the video slicing with fake trace in the network video can be detected only by adopting a fixed detection mode from the first frame to the last frame aiming at different network videos or the same section of network video at present, the network video with single use scene and cannot flexibly cope with various sources or detection service demands. For the situations of a large number of detection requirements and various video sources, the video fragments or network videos with counterfeit marks cannot be detected efficiently and accurately, and batch marking, auditing and filtering are inconvenient.

2. In addition, each detection tool for detecting the authenticity of the network video can only detect 1 fixed fake trace, for example, the PS detection tool can only detect the fake trace of the network video synthesized by adopting the PS technology, and the depth fake detection tool can only detect the fake trace of the network video synthesized by adopting the depth fake correlation technology, which is also easy to miss.

Therefore, when the network video is detected in a single detection mode and the forgery modes are varied, the network video of all the forgery modes cannot be detected by the same detection tool.

Disclosure of Invention

The embodiment of the application provides a method, a device and a storage medium for detecting depth counterfeit video, which can be switched to an adaptive counterfeit video detection mode at will when detecting counterfeit video of different counterfeit modes, especially when detecting network video with various counterfeit modes, can be dynamically switched to the true and false detection of the current video to be detected, and can detect at least one network video adopting at least one counterfeit mode. The flexibility of detection is high, the method is suitable for diversified videos, the test coverage is wide, and the situation that videos manufactured in a certain fake mode are missed to be detected can be effectively avoided.

In a first aspect, an embodiment of the present application provides a method for detecting a depth counterfeit video, including:

a first video to be detected from at least one video source Fang Huoqu, the first video to be detected comprising a plurality of first face images of at least one target face;

identifying a first face action of a target face in at least one first face image in the first video to be detected;

Determining a first detection mode from a plurality of preset detection modes based on the first facial action;

and calling the first detection mode to identify the true or false of the target face in the video to be detected, and outputting a first detection result.

In a possible implementation manner, before the act of identifying the face of the target face in at least one of the first face images in the first video to be detected, the method further includes:

detecting at least one video frame in the video to be detected;

if the fact that the target face in the target video frame in the video to be detected meets the preset fake trace condition is detected, detecting the rest video frames in the first video to be detected is finished, and a first detection result is generated based on the video frames which meet the preset fake trace condition in the first video to be detected; the target video frame is any video frame in the video to be detected, and the first detection result indicates that the video to be detected is a fake video.

In a possible implementation manner, before determining the target detection mode from a plurality of preset detection modes based on the facial action, the method further includes:

Analyzing the historical detection result according to the type of the fake trace to obtain a fake mode corresponding to the historical processing video;

and performing duty ratio analysis on the forging mode of the target video frame meeting the preset forging trace condition according to the historical detection result, and setting a preset mark on a video source side of the historical processing video according to the duty ratio of the forging mode.

In a possible implementation manner, after the preset mark is set on the video source side of the history processing video according to the duty ratio of the fake mode, the method includes:

a first video to be detected from a first video source Fang Huoqu;

determining a second detection mode according to a preset corresponding relation and channel identification of the first video source side, wherein the preset corresponding relation comprises a corresponding relation of a preset mark, a default detection mode and the channel identification;

and detecting the fake trace of the first video to be detected according to the second detection mode to obtain a second detection result.

In a possible implementation manner, after the second detection result is obtained, the method further includes:

acquiring first analysis data of the first video to be detected and second analysis data of the second detection result;

If it is determined that at least one of abnormality, misjudgment or omission exists in the second analysis data according to the first analysis data and the second analysis data, a second face action of a target face in at least one first face image in the first video to be detected is recognized;

determining a third detection mode from a plurality of preset detection modes based on the second facial action;

and calling the third detection mode to identify the true or false of the target face in the video to be detected, and outputting a third detection result.

In a possible implementation manner, after the determining, based on the second facial action, a third detection mode from a plurality of preset detection modes, the method further includes:

and updating the third detection mode into the preset corresponding relation.

In a possible implementation manner, the method further includes:

determining a target fragment which accords with a preset forging condition in the first video to be detected, wherein the target fragment comprises at least one video frame with continuous or interval playing time;

analyzing the playing content corresponding to the target fragment;

if the matching degree of the playing content and the video description information is lower than a first threshold value, or if the matching degree belongs to the video description information but the duty ratio of the target slice is lower than a preset duty ratio, determining that the target slice is a non-key slice, setting a first mark for the first video to be detected, wherein the mark is used for indicating that the first video to be detected is classified as a normal video.

In a possible implementation manner, the method further includes:

and if the duty ratio of the target slice in the first video to be detected is smaller than a second threshold value and the content weight of at least one video frame in the target slice is lower than a preset weight, the forged object does not belong to the target object, and the first video to be detected is classified as a normal video.

In a possible implementation manner, the identifying the first face action of the target face in at least one of the first face images in the first video to be detected; determining a first detection mode from a plurality of preset detection modes based on the first facial action, wherein the method comprises the following steps:

screening a first set from the first video to be detected, wherein the first set comprises a plurality of face images conforming to a preset face action;

and detecting fake trace of each face image in the first set.

In a possible implementation manner, after the setting of the preset mark on the video source side of the history processing video according to the duty ratio of the forging manner, the method further includes:

Acquiring a third video to be detected from a first video source side;

acquiring historical detection data of the first video source party in a historical period according to the channel identifier of the first video source party to determine a fourth detection mode, wherein the fourth detection mode comprises a default detection mode or a priority detection mode;

and detecting the fake trace of the third video to be detected according to the fourth detection mode to obtain a fourth detection result.

In a second aspect, an embodiment of the present application provides a video detection apparatus having a function of implementing a depth-counterfeit video detection method corresponding to the first aspect. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware.

In some embodiments, the video detection apparatus includes:

the input-output module is used for receiving a first video to be detected from at least one video source Fang Huoqu, wherein the first video to be detected comprises a plurality of first face images of at least one target face;

the processing module is used for identifying a first face action of a target face in at least one first face image in the first video to be detected, which is acquired by the input/output module; determining a first detection mode from a plurality of preset detection modes based on the first facial action; and calling the first detection mode to identify the true or false of the target face in the video to be detected, and outputting a first detection result through the input and output module.

In a third aspect, embodiments of the present application provide a computer device comprising at least one connected processor, a memory and a transceiver, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program in the memory to perform the method provided in the various possible designs of the first aspect and the first aspect.

A further aspect of embodiments of the application provides a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method provided in the various possible designs of the first aspect, the various possible designs of the first aspect described above.

In yet another aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, which executes the computer instructions, causing the computer device to perform the methods provided in the various possible designs of the first aspect described above.

Compared with the prior art, in the scheme provided by the embodiment of the application, a plurality of face recognition models and at least two fake video detection modes are deployed in a face recognition system. In one aspect, each fake video detection mode can detect the fake video manufactured by the corresponding fake means well and in a targeted manner, so that the fake video detection method can be switched to an adaptive fake video detection mode at will when detecting the fake video of different fake modes, particularly when detecting the network video with various fake modes, the fake detection method can be dynamically switched to the true or false detection of the current video to be detected, and at least one network video adopting at least one fake mode can be detected. In addition, the flexibility of detection is high, the method is suitable for diversified videos, the test coverage is wide, and the situation that videos manufactured in a certain fake mode are missed to be detected can be effectively avoided; in other words, the method and the device for detecting the false detection of the target face in the video to be detected can ensure that whether the false fragments exist in the first video to be detected or not is accurately judged and the detection efficiency can be improved under the detection strategy.

Drawings

The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.

FIG. 1a is a schematic view of an application environment for implementing a method for detecting a deep forgery video according to an embodiment of the present application;

fig. 1b is a schematic diagram of an application scenario of a depth counterfeit video detection method according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for detecting a depth counterfeit video in an embodiment of the application;

FIG. 3 is a schematic illustration of in vivo detection in an embodiment of the present application;

FIG. 4 is a flow chart of a method for detecting a depth counterfeit video in an embodiment of the application;

FIG. 5 is a schematic diagram of a visual display of a detection result according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a visual display of a detection result according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a visual display of a detection result according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a video detection apparatus for implementing a depth forgery video detection method according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a computer device implementing a method for detecting depth forgery video according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a mobile phone for implementing a method for detecting a deep forgery video according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a server for implementing a depth forgery video detection method according to an embodiment of the present application.

Detailed Description

The terms "first", "second", and the like in the description and the claims of the embodiments of the present application and in the foregoing drawings are used for distinguishing similar objects (e.g., the first face image and the second face image in the embodiments of the present application respectively represent face images corresponding to different identities), and are not necessarily used for describing a specific order or sequence. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those explicitly listed but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus, such that the partitioning of modules by embodiments of the application is only one logical partitioning, may be implemented with additional partitioning, such as a plurality of modules may be combined or integrated in another system, or some features may be omitted, or not implemented, and further, such that the coupling or direct coupling or communication connection between modules may be via some interfaces, indirect coupling or communication connection between modules may be electrical or otherwise similar, none of which are limited in embodiments of the application. The modules or sub-modules described as separate components may or may not be physically separate, may or may not be physical modules, or may be distributed in a plurality of circuit modules, and some or all of the modules may be selected according to actual needs to achieve the purposes of the embodiment of the present application.

The embodiment of the application provides a method, a device and a storage medium for detecting depth counterfeit video, which can be applied to video detection scenes, wherein the scheme can be executed by a server or a terminal. In some embodiments, when the scheme is applied to an application scenario as shown in fig. 1a, the application scenario may include a server and a plurality of data sources. When the above-described depth forgery video detection method is implemented based on the application environment as shown in fig. 1a, a specific flow may refer to fig. 1b. The server may perform fake trace detection on video from the data source, specifically:

the data source party is a party storing video, such as a short video platform, a social platform, a network database, etc., and in the embodiment of the present application, the data source party may be a server or a terminal, which is not limited in the embodiment of the present application.

After the server acquires the video to be detected from at least one data source side, running a pre-deployed trained image processing model, recognizing the face action of a target face in at least one face image in the video to be detected, pre-judging a matched detection mode according to the recognized face action state, and detecting fake trace of the video to be detected based on the detection mode. At least two types of detection tools, such as a PS detection tool and a deep pseudo detection tool, may be deployed in the server, the deep pseudo detection tool comprising n deep pseudo detection algorithms (all labeled as models in fig. 1b, which are not distinguished). The PS detection tool and the deep pseudo detection tool may be deployed separately or integrally, and embodiments of the present application are not limited. As shown in fig. 1b, the server may select, by pre-determination, a target model from the deep pseudo detection tools as a detection method of the video to be detected, where the target model outputs a detection result that there is a fake trace, and the detection result may include, for example: the play time point of the fake trace and the video frame of the fake trace exist.

It should be noted that, the server (for example, the video detection device) according to the embodiments of the present application may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The video detection device according to the embodiment of the application can be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart sound box, a smart watch, a personal digital assistant and the like, but is not limited to the smart phone, the tablet computer, the notebook computer, the desktop computer, the smart sound box, the smart watch, the personal digital assistant and the like. The video detection device and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.

The scheme of the embodiment of the application can be realized based on artificial intelligence (Artificial Intelligence, AI), natural language processing (Nature Language Processing, NLP), machine Learning (ML) and other technologies, and is specifically described by the following embodiments:

The AI is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

AI technology is a comprehensive discipline, and relates to a wide range of technologies, both hardware and software. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

NLP is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Aiming at the video detection problem in the artificial intelligence field, the embodiment of the application mainly adopts the following technical scheme: a plurality of face recognition models and PS detection strategies are deployed in a face recognition system, face actions of a target face in a video to be detected are recognized, an adaptive target detection mode is determined according to the states of the face actions, and true and false detection is carried out on the target face according to the target detection mode.

The following describes the technical scheme of the embodiment of the present application in detail with reference to fig. 2 to 7.

Referring to fig. 2, a flow chart of a method for detecting deep forgery video according to an embodiment of the present application is shown, where the method may be performed by a service server, and the service server may be various video detection platforms. The embodiment of the application mainly includes steps 101 to 104 illustrated in fig. 2, and is described as follows:

101. a first video to be detected from at least one video source Fang Huoqu.

The first video to be detected comprises a plurality of first face images of at least one target face. The target face refers to a face with a face motion in the first video to be detected, for example, an upper body head image of a girl 'small beauty' appears in n continuous frames, and the 'small beauty' is changed from an expression-free process to a pupil enlarging process to a mouth opening process to a smiling process in the n continuous frames. It will be appreciated that the target face may correspond to the same user identity or at least two user identities, which is not limited in this embodiment of the present application.

102. And identifying a first face action of a target face in at least one first face image in the first video to be detected.

The facial action may include smiling, opening the mouth, blinking, turning the head, lowering the head, lifting the head, frowning, etc. For example, an interface change schematic of a liveness experience task is shown in fig. 3, after clicking starts, a user is prompted to look at a mobile phone screen and keep still, then the user is prompted to turn around, and then liveness experience on the user is started.

103. And determining a first detection mode from a plurality of preset detection modes based on the first facial action.

It can be understood that the video detection device in the embodiment of the application deploys a plurality of preset detection modes in advance so as to meet the detection requirements of videos in various fake modes.

Specifically, the adapted object detection mode may be determined according to the state (e.g., static, dynamic) of the facial action, where the object detection mode corresponds to at least one object model. The target video frame corresponding to any playing time can be randomly selected, and the fake trace detection is carried out from the face action in the selected target video frame, and the method specifically comprises the following two situations:

(1) If the device is static, such as mouth closing, face opening, normal eye opening, etc., the device detects whether the device is forged by PS or not by a general detection mode (which may be a PS detection mode or a machine learning model).

For example, credentials, the faces PS in the credentials are applied to complex backgrounds in the video to obtain a counterfeited video.

The reason is that: the black product can usually obtain an identity card photo, and the certificate photo is a photo in a regular format, so that the face from the face PS on the certificate photo to the complex background is strange or unnatural. For example, a living background, which is relatively complex compared to a credential background.

Since the deep forgery marks of PS are not obvious, the forgery marks cannot be recognized from machine vision for the face recognition system. Detection tools dedicated to detecting PS traces are typically employed to detect the authenticity of the target object in the still video.

(2) If the dynamic state such as action exists, the existence of the deep fake trace can be determined, and then the deep fake detection model is called to identify the true or false.

The actions may include nodding, opening the mouth, shaking the head, closing the eyes, opening the eyes, etc., wherein nodding and shaking the head may be characterized by a face pose angle, such as a nodding pitch angle; yaw angle for shaking head relates to side face angle.

Specifically, whether the head is dynamic or not can be determined according to the face states such as the rotating angle, the opening width, the nodding width and the like.

104. And calling the first detection mode to identify the true or false of the target face in the video to be detected, and outputting a first detection result.

Meanwhile, to facilitate subsequent analysis, a first mark may be set for the video to be detected according to the detection result, and the second mark may include at least one of a true and false mark, a fake type, a fake frame amount, and the like.

Compared with the prior art, in the embodiment of the application, a plurality of face recognition models and at least two fake video detection modes are deployed in a face recognition system. In one aspect, each fake video detection mode can detect the fake video manufactured by the corresponding fake means well and in a targeted manner, so that the fake video detection method can be switched to an adaptive fake video detection mode at will when detecting the fake video of different fake modes, particularly when detecting the network video with various fake modes, the fake detection method can be dynamically switched to the true or false detection of the current video to be detected, and at least one network video adopting at least one fake mode can be detected. In addition, the flexibility of detection is high, the method is suitable for diversified videos, the test coverage is wide, and the situation that videos manufactured in a certain fake mode are missed to be detected can be effectively avoided; in other words, the method and the device for detecting the false detection of the target face in the video to be detected can ensure that whether the false fragments exist in the first video to be detected or not is accurately judged and the detection efficiency can be improved under the detection strategy.

In some embodiments of the present application, considering that different video sources use different depth forging techniques, or certain rules exist, or different hues of images of the forged video, in order to better adapt to each video to be detected, a target detection mode adapted to the current video to be detected may be selected. In particular, at least one deep pseudo detection algorithm may be deployed in the detection tool, each deep pseudo detection algorithm corresponding to one deep counterfeit scene.

In some embodiments of the present application, in order to improve the detection efficiency of the forged video, the embodiments of the present application mainly start from the following first-type detection policy and second-type detection policy:

first type of detection strategy: detecting fragments (including at least one video frame) in a video to be detected, wherein the main strategies 1 to 3 are as follows:

strategy 1: coarse inspection

Coarse inspection of the video to be detected may be performed, in particular, before the face action of identifying the target face in at least one of the first face images in the first video to be detected, the method further includes:

detecting at least one video frame in the video to be detected;

As long as the face in at least one frame is detected to meet the preset fake trace condition, the detection of the rest video frames in the video to be detected can be stopped, and the video frames meeting the preset fake trace in the video to be detected can be preliminarily determined or directly judged. The detection result can be directly output, and the detection result is used for indicating that the video to be detected is a fake video. The detection mechanism can effectively improve the detection efficiency.

Strategy 2: hierarchical detection

In some embodiments, considering the detection accuracy and the targeted analysis of the video to be detected, a hierarchical detection strategy may be adopted, that is, on the basis of the above-mentioned coarse detection, the false video determined after the coarse detection is further subjected to fine detection (i.e. the detection mode is determined based on the facial motion). Specifically, after obtaining the second detection result, the method further includes:

It can be seen that the false video determined after the rough inspection is further subjected to the fine inspection on the basis of the rough inspection. For example, the action judgment and the true and false trace analysis are carried out on the rest video frames in the video to be detected, and then the true and false detection result corresponding to each video frame is obtained, so that the hierarchical detection strategy can effectively improve the detection efficiency and the accuracy.

Strategy 3: decision detection mode based on preset corresponding relation

In other embodiments, the duty ratio analysis may be performed on the fake means in the video to be detected, the video source side may be marked, and after the new video to be detected is obtained from the corresponding video source side, the default detection mode may be directly called according to the preset correspondence between the first mark maintained in advance, the default detection mode and the ID of the video source side (table 1 below), so as to perform the true and false detection on the new video to be detected from the video source side. Specifically, before determining the target detection mode from a plurality of preset detection modes based on the facial action, the method further includes:

Correspondingly, after setting a preset mark on a video source side of the history processing video according to the duty ratio of the fake mode, the method comprises the following steps:

a first video to be detected is received from the first video source Fang Huoqu, the first video to be detected being a plurality of. The method comprises the steps of carrying out a first treatment on the surface of the

	Preset first mark	Default detection mode	Video source side ID
				Correspondence relation 1	a0	Deep pseudo detection mode	10223
Correspondence 2	b0	PS	20384
				…	…	Others	…

TABLE 1

Therefore, on one hand, when facing the first video to be detected, the decision time for selecting the second detection mode for detecting the first video to be detected can be saved; on the other hand, even if a part of the new video to be detected with fake marks (other fake means) is not detected, the overall detection efficiency is improved to a certain extent, and particularly, the effect is more remarkable when detecting a large amount of videos.

In other embodiments, if a large number of video detection results are analyzed at a later stage, and it is found that a majority of videos cannot detect the existence of the fake trace after the default detection method is selected according to the preset correspondence, then the detection method determined based on the facial action in the above embodiment needs to be re-adopted, and then the default detection method in the preset correspondence shown in table 1 below is updated. Alternatively, the final detection mode is determined according to the historical detection data, and the following description is provided respectively:

(1) Determining a detection mode based on the facial action, and then updating a preset corresponding relation

As shown in fig. 4, after obtaining the second detection result, the method further includes:

Accordingly, after determining the third detection mode from the multiple preset detection modes based on the second facial motion, the third detection mode may also be updated to the preset correspondence, for example, update table 1 above.

It can be seen that, on the basis of determining the default detection mode based on the preset correspondence shown in the above table 1 and performing batch and rapid preliminary detection based on the default detection mode, the present embodiment further adopts the detection mode determined based on the facial action in the above embodiment to perform refined detection, so as to further detect the partial or all missed fragments with counterfeit traces on the basis of the second detection result. On one hand, the detection accuracy can be improved through two detection steps; on the other hand, after the third detection result is obtained, the preset corresponding relation can be updated based on the third detection result, so that the target detection mode with higher matching degree and better detection effect can be provided in the rapid detection based on the continuously updated preset corresponding relation, positive feedback is formed, and the preset corresponding relation is continuously optimized.

(2) Determining a final detection mode according to the historical detection data:

for example, after setting a preset flag on a video source side of the history-processed video according to a falsification-way duty ratio, the method further includes:

acquiring a third video to be detected from a first video source side;

Therefore, before the detection mode is determined, the fourth detection mode is determined based on the historical detection data of the first video source side, and as the historical monitoring detection data has a certain representativeness and can characterize the tendency and suitability of the image monitoring device to the detection mode of the video from the first video source side in the historical period, the fourth detection mode determined based on the historical monitoring data can better continue to detect the subsequent video from the first video source side, so that the decision time of the detection mode is shortened, and the detection accuracy and the detection efficiency are ensured.

In some embodiments, the ratio of the target video slice with the fake trace in the video to be detected can be further analyzed, so that analysis can be performed on each video source side. For example, a source side to which the video to be detected with the duty ratio larger than the first threshold belongs may be set with a second mark, statistics is performed on the video amount of the source side to which the second mark is set, and each source side is classified according to the statistics data. Wherein the second mark may comprise a counterfeit type, a counterfeit frame duty cycle. The second mark can be used for video credit level, public opinion analysis and the like, and the embodiment of the application does not limit the marking mode and marking content.

In some embodiments, even if there is a small number of target slices (i.e., at least one video frame with a fake trace), if the target slices do not affect the user's evaluation or content output after the entire video to be detected is viewed, the video to be detected may be classified as a normal video. Specifically, the embodiment of the application also provides the following two detection strategies a and b, as follows:

detection strategy a: content relevance determination based on video clips

(1) Firstly, determining target fragments which accord with preset forging conditions in the first video to be detected, wherein the target fragments comprise at least one video frame with continuous or interval playing time;

(2) And if the duty ratio of the target slice in the first video to be detected is smaller than a second threshold value and the content weight of at least one video frame in the target slice is lower than a preset weight, the forged object does not belong to the target object, and the first video to be detected is classified as a normal video.

For example, the duration of video playing to be detected is 1h, and at the playing time 36:23: a fake trace exists for the target video frame corresponding to 56 minutes, but the target video frame also affects the user's correct understanding of the whole video to be detected. For example, the video to be detected is a 1h speech video of the specific person a, and at the playing time 36:23: a face (which may be the specific character a, other specific characters b, audience/staff characters, etc.) in the corresponding target video frame is swapped 56, but the target video frame affects the user's correct understanding of the whole lecture video and acceptance of the lecture behavior of the specific character a in binding relationship with the lecture video.

If the video to be detected is directly marked, the video to be detected is too severe and unnecessary. Therefore, it can be firstly determined whether the duty ratio of the target slice in the video to be detected is smaller than a second threshold value, and the content weight of at least one video frame in the target slice is lower than a preset weight, and if the forged object does not belong to the target object, the video to be detected is classified as a normal video and is not filtered.

Therefore, the fake video which really influences the video content expression can be accurately screened through the detection strategy a, so that the situation that the whole video to be detected is judged to be the fake video as long as the fragments with the depth fake trace are detected from the video to be detected is reduced, namely the integral judgment of the video to be detected is reduced due to the influence of the fragments with the fuzzy fake detection.

Detection strategy b: judgment of importance degree of video slicing in video to be detected

The method specifically comprises the following steps:

(1) Determining a target fragment which accords with a preset forging condition in the first video to be detected, wherein the target fragment comprises at least one video frame with continuous or interval playing time;

(2) Analyzing the playing content corresponding to the target fragment;

(3) If the matching degree of the playing content and the video description information is lower than a first threshold value, or if the matching degree belongs to the video description information but the duty ratio of the target slice is lower than a preset duty ratio, determining that the target slice is a non-key slice, setting a first mark for the first video to be detected, wherein the mark is used for indicating that the first video to be detected is classified as a normal video.

Therefore, the fake video which really influences the video content expression can be accurately screened out through the detection strategy b, and the integral judgment of the video to be detected, which is influenced by the fragmentation of some fuzzy fake detection, is reduced.

The second type of detection strategy: video frame with dynamic state of focus detection face action

Since the accuracy of determining whether the video to be detected is a forged video by determining whether the video frame having a facial motion has a forged trace or not is higher than that by determining whether the video frame having a silence state is a forged video or not, in order to improve the detection efficiency while ensuring the detection accuracy, the deep pseudo detection method may be preferentially adopted for detection when determining that the fragments having facial features exist in the video to be detected.

For example, a first set is first screened from the first video to be detected, and then fake trace detection is performed on each face image in the first set. The first set includes a plurality of face images conforming to a preset face action, and can skip frames or detect frame loss, and the specific schematic diagram and the mode are not limited.

Any technical features mentioned in the embodiments corresponding to any one of fig. 1 to fig. 7 are also applicable to the embodiments corresponding to fig. 8 to fig. 11 in the embodiments of the present application, and the following similar parts will not be repeated.

The above description is given of a depth-counterfeit video detection method in the embodiment of the present application, and a video detection device for performing the depth-counterfeit video detection method is described below.

Referring to fig. 8, a schematic diagram of a video detection apparatus 40 shown in fig. 8 can be used to detect counterfeit traces of network videos from different data sources to detect the presence of at least one modified network video, so as to finally purify the network, avoid unnecessary public opinion fermentation, and ensure the authenticity of the network video. The video detection device 40 according to the embodiment of the present application can implement the steps in the depth-forgery video detection method performed by the video detection device 40 according to the embodiment corresponding to any one of fig. 1 to 6. The functions implemented by the video detection device 40 may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware. The video detection device 40 may include an input-output module 401 and a processing module 402. The functional implementation of the input/output module 401 and the processing module 402 may refer to the operations performed in any of the embodiments corresponding to fig. 1 to fig. 7, which are not described herein.

In some embodiments, the input-output module 401 may be configured to receive a first video to be detected from at least one video source Fang Huoqu, the first video to be detected including a plurality of first face images of at least one target face;

The processing module 402 may be configured to identify a first facial action of a target face in at least one of the first face images in the first video to be detected acquired by the input/output module 401; determining a first detection mode from a plurality of preset detection modes based on the first facial action; and calling the first detection mode to identify the true or false of the target face in the video to be detected, and outputting a first detection result through the input and output module.

In some embodiments, before the processing module 402 identifies the facial action of the target face in the at least one first face image in the first video to be detected, the processing module is further configured to:

detecting at least one video frame in the video to be detected;

In some embodiments, before the processing module 402 determines the target detection mode from the plurality of preset detection modes based on the facial motion, the processing module is further configured to:

In some embodiments, after setting the preset flag on the video source side of the history-processed video according to the duty ratio of the fake method, the processing module 402 is further configured to:

a first video to be detected from a first video source Fang Huoqu through the input output module 401;

In some embodiments, after the processing module 402 obtains the second detection result, the processing module is further configured to:

In some embodiments, the processing module 402 is further configured to, after determining a third detection mode from a plurality of preset detection modes based on the second facial motion:

and updating the third detection mode into the preset corresponding relation.

In some embodiments, the processing module 402 is further configured to:

analyzing the playing content corresponding to the target fragment;

In some embodiments, the processing module 402 is further configured to:

In some embodiments, the processing module 402 is specifically configured to:

and detecting fake trace of each face image in the first set.

acquiring a third video to be detected from a first video source side through the input/output module 401;

The specific manner in which the respective modules perform the operations in the video detection apparatus in the above-described embodiments has been described in detail in the embodiments related to the method, and will not be described in detail here.

As is clear from the video detection device 40 illustrated in fig. 8, a plurality of face recognition models and at least two fake video detection methods are deployed in the face recognition system. In one aspect, each fake video detection mode can detect the fake video manufactured by the corresponding fake means well and in a targeted manner, so that the fake video detection method can be switched to an adaptive fake video detection mode at will when detecting the fake video of different fake modes, particularly when detecting the network video with various fake modes, the fake detection method can be dynamically switched to the true or false detection of the current video to be detected, and at least one network video adopting at least one fake mode can be detected. In addition, the flexibility of detection is high, the method is suitable for diversified videos, the test coverage is wide, and the situation that videos manufactured in a certain fake mode are missed to be detected can be effectively avoided; in other words, the method and the device for detecting the false detection of the target face in the video to be detected can ensure that whether the false fragments exist in the first video to be detected or not is accurately judged and the detection efficiency can be improved under the detection strategy.

The video detecting apparatus 40 for performing the depth-forgery-inhibited video detecting method in the embodiment of the present application has been described above from the viewpoint of the modularized functional entity, and the video detecting apparatus 40 for performing the depth-forgery-inhibited video detecting method in the embodiment of the present application will be described below from the viewpoint of hardware processing, respectively. It should be noted that, in the embodiment of the present application shown in fig. 7, the physical device corresponding to the input/output module 401 may be an input/output unit, a transceiver, a radio frequency circuit, a communication module, an output interface, etc., and the physical device corresponding to the processing module 402 may be a processor. The video detecting apparatus 40 shown in fig. 7 may have a structure as shown in fig. 9, and when the video detecting apparatus 40 shown in fig. 7 has a structure as shown in fig. 9, the processor and the transceiver in fig. 9 can realize the same or similar functions as the input/output module 401 and the processing module 402 provided in the foregoing apparatus embodiment corresponding to the video detecting apparatus 40, and the memory in fig. 9 stores a computer program to be called when the processor performs the foregoing depth forgery video detecting method.

The embodiment of the present application further provides another video detection device, as shown in fig. 10, for convenience of explanation, only the portion relevant to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the method portion of the embodiment of the present application. The video detection device can be any video detection device including a mobile phone, a tablet personal computer, a personal digital assistant (English: personal Digital Assistant, english: PDA), a Sales video detection device (English: point of Sales, english: POS), a vehicle-mounted computer and the like, taking the video detection device as an example of the mobile phone:

Fig. 10 is a block diagram showing a part of the structure of a mobile phone related to a video detection device according to an embodiment of the present application. Referring to fig. 10, the mobile phone includes: radio Frequency (RF) circuit 710, memory 720, input unit 730, display unit 740, sensor 780, audio circuit 760, wireless-fidelity (Wi-Fi) module 7100, processor 780, and power supply 790. It will be appreciated by those skilled in the art that the handset construction shown in fig. 7 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 10:

the RF circuit 710 may be configured to receive and transmit signals during a message or a call, and specifically, receive downlink information of a base station and process the downlink information with the processor 780; in addition, the data of the design uplink is sent to the base station. Generally, RF circuitry 710 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (English full name: low Noise Amplifier, english short name: LNA), a duplexer, and the like. In addition, the RF circuitry 710 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (english: global System of Mobile communication, english: GSM), general packet radio service (english: general Packet Radio Service, english: GPRS), code division multiple access (english: code Division Multiple Access, CDMA), wideband code division multiple access (english: wideband Code Division Multiple Access, english: WCDMA), long term evolution (english: long Term Evolution, english: LTE), email, short message service (english: short Messaging Service, english: SMS), and the like.

The memory 720 may be used to store software programs and modules, and the processor 780 performs various functional applications and data processing of the handset by running the software programs and modules stored in the memory 720. The memory 720 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 720 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 730 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 730 may include a touch panel 731 and other input devices 732. The touch panel 731, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on or thereabout the touch panel 731 using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 731 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 780, and can receive commands from the processor 780 and execute them. In addition, the touch panel 731 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 730 may include other input devices 732 in addition to the touch panel 731. In particular, the other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 740 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 740 may include a display panel 741, and optionally, the display panel 741 may be configured in the form of a liquid crystal display (english: liquid Crystal Display, abbreviated as LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 731 may cover the display panel 741, and when the touch panel 731 detects a touch operation thereon or thereabout, the touch operation is transferred to the processor 780 to determine the type of touch event, and then the processor 780 provides a corresponding visual output on the display panel 741 according to the type of touch event. Although in fig. 7, the touch panel 731 and the display panel 741 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 731 and the display panel 741 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 780, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 741 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 741 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.

Audio circuitry 760, speaker 761, and microphone 762 may provide an audio interface between a user and a cell phone. The audio circuit 760 may transmit the received electrical signal converted from audio data to the speaker 761, and the electrical signal is converted into a sound signal by the speaker 761 to be output; on the other hand, microphone 762 converts the collected sound signals into electrical signals, which are received by audio circuit 760 and converted into audio data, which are processed by audio data output processor 780 for transmission to, for example, another cell phone via RF circuit 710 or for output to memory 720 for further processing.

Wi-Fi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive e-mails, browse webpages, access streaming media and the like through a Wi-Fi module 7100, so that wireless broadband Internet access is provided for the user. Although fig. 9 shows Wi-Fi module 7100, it is understood that it does not belong to the necessary constitution of the mobile phone, and can be omitted entirely as required within the scope of not changing the essence of the application.

The processor 780 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions and processes of the mobile phone by running or executing software programs and/or modules stored in the memory 720 and calling data stored in the memory 720, thereby performing overall monitoring of the mobile phone. Optionally, the processor 780 may include one or more processing units; preferably, the processor 780 may integrate an application processor that primarily processes operating systems, user interfaces, applications, etc., with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 780.

The handset further includes a power supply 790 (e.g., a battery) for powering the various components, which may be logically connected to the processor 780 through a power management system, thereby performing functions such as managing charging, discharging, and power consumption by the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In the embodiment of the present application, the processor 780 included in the mobile phone further has a control function to execute the above method executed by the video detection device 40 shown in fig. 10. The steps performed by the video detection device in the above embodiment may be based on the mobile phone structure shown in fig. 10. For example, the processor 722 may perform the following operations by invoking instructions in the memory 732:

a first video to be detected from at least one video source Fang Huoqu through the input unit 730, the first video to be detected comprising a plurality of first face images of at least one target face;

recognizing a first face action of a target face in at least one first face image in the first video to be detected, which is acquired by the input/output module 401; determining a first detection mode from a plurality of preset detection modes based on the first facial action; and calling the first detection mode to identify the true or false of the target face in the video to be detected, and outputting a first detection result through the input unit 730.

The embodiment of the present application further provides another video detection apparatus for implementing the above-mentioned deep forgery video detection method, as shown in fig. 11, fig. 11 is a schematic diagram of a server structure provided in the embodiment of the present application, where the server 1020 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (english: central processing units, english: CPU) 1022 (e.g., one or more processors) and a memory 1032, and one or more storage media 1030 (e.g., one or more mass storage devices) storing application programs 1042 or data 1044. Wherein memory 1032 and storage medium 1030 may be transitory or persistent. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, central processor 1022 may be configured to communicate with storage medium 1030 to execute a series of instruction operations in storage medium 1030 on server 1020.

The Server(s) 1020 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1058, and/or one or more operating systems 1041, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like.

The steps performed by the service server (e.g., the video detecting apparatus 40 shown in fig. 7) in the above-described embodiment may be based on the structure of the server 1020 shown in fig. 11. The steps performed by the video detection apparatus 40 shown in fig. 7 in the above embodiment may be based on the server structure shown in fig. 11, for example. For example, the processor 1022 may perform the following operations by invoking instructions in the memory 1032:

a first video to be detected from at least one video source Fang Huoqu via the input output interface 1058, the first video to be detected comprising a plurality of first face images of at least one target face;

recognizing a first face action of a target face in at least one first face image in the first video to be detected, which is acquired by the input/output module 401; determining a first detection mode from a plurality of preset detection modes based on the first facial action; and calling the first detection mode to identify the true or false of the target face in the video to be detected, and outputting a first detection result through an input-output interface 1058.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program is loaded and executed on a computer, the flow or functions according to the embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

The above description has been made in detail on the technical solutions provided by the embodiments of the present application, and specific examples are applied in the embodiments of the present application to illustrate the principles and implementation manners of the embodiments of the present application, where the above description of the embodiments is only for helping to understand the methods and core ideas of the embodiments of the present application; meanwhile, as for those skilled in the art, according to the idea of the embodiment of the present application, there are various changes in the specific implementation and application scope, and in summary, the present disclosure should not be construed as limiting the embodiment of the present application.

Claims

1. A method for detecting depth counterfeit video, said method comprising:

2. The depth counterfeit video detection method of claim 1, wherein said identifying a face of a target face in at least one of said first face images in said first video to be detected is preceded by a facial action, said method further comprising:

detecting at least one video frame in the video to be detected;

3. The depth counterfeit video detection method of claim 2, wherein prior to determining a target detection mode from a plurality of preset detection modes based on said facial motion, said method further comprises:

4. A depth counterfeit video detection method according to claim 3, wherein after said setting a preset mark on a video source side of said history-processed video according to a counterfeit mode duty ratio, said method comprises:

a first video to be detected from a first video source Fang Huoqu;

5. The method for detecting deep forgery video according to claim 4, wherein after the second detection result is obtained, the method further comprises:

6. The depth counterfeit video detection method of any one of claims 1 to 5, further comprising:

analyzing the playing content corresponding to the target fragment;

7. The depth counterfeit video detection method of any one of claims 1 to 5, further comprising:

8. The depth counterfeit video detection method of any of claims 1-5, wherein said first facial action of identifying a target face in at least one of said first face images in said first video to be detected; determining a first detection mode from a plurality of preset detection modes based on the first facial action, wherein the method comprises the following steps:

and detecting fake trace of each face image in the first set.

9. A depth counterfeit video detection method according to claim 3, wherein after said setting a preset mark on a video source side of said history-processed video according to a counterfeit mode duty ratio, said method further comprises:

acquiring a third video to be detected from a first video source side;

10. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-9 when the computer program is executed.