WO2022110806A1 - 视频检测方法、装置、设备及计算机可读存储介质 - Google Patents
视频检测方法、装置、设备及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2022110806A1 WO2022110806A1 PCT/CN2021/103766 CN2021103766W WO2022110806A1 WO 2022110806 A1 WO2022110806 A1 WO 2022110806A1 CN 2021103766 W CN2021103766 W CN 2021103766W WO 2022110806 A1 WO2022110806 A1 WO 2022110806A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- frame
- result
- detection
- video stream
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 452
- 238000000034 method Methods 0.000 claims abstract description 44
- 230000000875 corresponding effect Effects 0.000 claims description 87
- 230000004927 fusion Effects 0.000 claims description 60
- 230000006870 function Effects 0.000 claims description 50
- 230000004044 response Effects 0.000 claims description 31
- 238000013507 mapping Methods 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 7
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000007500 overflow downdraw method Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/95—Pattern authentication; Markers therefor; Forgery detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
Definitions
- the present disclosure relates to computer vision technology, and in particular, to a video detection method, apparatus, device, and computer-readable storage medium.
- Embodiments of the present disclosure provide a video detection solution.
- a video detection method includes: acquiring a plurality of first video frames in a video to be processed, and a first video stream corresponding to the video to be processed; obtaining the single frame detection result of the authenticity detection of the first video frame; obtaining the video stream detection result of the authenticity detection of the first video stream; The video stream detection result of the first video stream is used to determine the authenticity discrimination result of the to-be-processed video.
- the acquiring a plurality of first video frames in the video to be processed includes: performing frame extraction processing on the video to be processed with a set frame number span to obtain the plurality of first video frames.
- the obtaining a single-frame detection result of performing authenticity detection on each of the first video frames includes: performing authenticity detection on the first video frame through a first authenticity classification network The detection is performed to obtain a single-frame detection result of the first video frame, wherein the single-frame detection result is used to represent the confidence that the first video frame is forged.
- the obtaining a video stream detection result of performing authenticity detection on the first video stream includes: using a second authenticity classification network, according to the Video frames and inter-frame relationships, perform authenticity detection on the first video stream, and obtain a video stream detection result of the first video stream, wherein the video stream detection result is used to indicate whether the first video stream is Fake confidence.
- determining the authenticity of the video to be processed according to the single-frame detection results of the plurality of first video frames and the video stream detection results of the first video stream includes: merging the respective single-frame detection results of the plurality of first video frames to obtain a fusion result; determining the authenticity discrimination result of the video to be processed according to the fusion result and the video stream detection result .
- the fusion of the respective single-frame detection results of the multiple first video frames to obtain the fusion result includes: detecting the single-frame detection of the multiple first video frames.
- the results are grouped to obtain a plurality of result groups including one or more single-frame detection results respectively; the average detection results of each of the result groups are obtained; the average detection results of each of the result groups are mapped by the first setting function as The first probability is to obtain a plurality of the first probabilities, wherein the first setting function is a nonlinear mapping function; and the fusion result is obtained according to the average detection result of each of the result groups and the plurality of first probabilities .
- obtaining a fusion result according to the average detection result of each of the result groups and the multiple first probabilities includes at least one of the following: in response to the multiple first probabilities The ratio of the first upper probabilities greater than the first set threshold is greater than the first set ratio, and the fusion result is obtained according to the average detection result of the result groups corresponding to each of the first upper probabilities; in response to the multiple The ratio of the first lower probability that is smaller than the second set threshold in the first probability is greater than the second set ratio, and the fusion result is obtained according to the average detection results of the result groups corresponding to each of the first lower probabilities; The first set threshold is greater than the second set threshold.
- the determining the authenticity discrimination result of the video to be processed according to the fusion result and the video stream detection result includes: detecting the fusion result and the video stream The results are weighted and averaged to obtain a weighted average result; the authenticity discrimination result of the video to be processed is determined according to the obtained weighted average result.
- the first video frame includes a plurality of human faces; and the acquiring a single-frame detection result of performing authenticity detection on each of the first video frames includes: acquiring the first video face detection frames corresponding to multiple faces in the frame; according to the image area corresponding to each of the described face detection frames, determine the single-person detection result of the corresponding face; The detection result is mapped to the second probability, and a plurality of the second probability is obtained, wherein, the second setting function is a nonlinear mapping function; according to the single-person detection result of each of the faces and the plurality of second probability to obtain the single-frame detection result of the first video frame.
- the single-frame detection result of the first video frame is obtained according to the single-person detection results of each of the faces and the plurality of second probabilities, including at least one of the following: In response to that there is a second probability greater than a third set threshold in the plurality of second probabilities, acquiring the largest single-person detection result in the first video frame as the single-frame detection result of the first video frame; responding When the plurality of second probabilities are all greater than the fourth set threshold, obtain the largest single-person detection result in the first video frame as the single-frame detection result of the first video frame; in response to the plurality of first video frames Both probabilities are less than the fifth set threshold, and the smallest single-person detection result in the first video frame is obtained as the single-frame detection result of the first video frame; wherein the third set threshold is greater than the third set threshold Four set thresholds, the fourth set threshold is greater than the fifth set threshold.
- the first authenticity classification network includes authenticity classification networks with various structures; the first authenticity classification network is used to perform authenticity detection on the first video frame, and obtain The single-frame detection result of the first video frame includes: performing authenticity detection on the first video frame through the authenticity classification network of various structures to obtain multiple sub-single-frame detection results; The function maps the multiple sub-single frame detection results to third probabilities respectively, to obtain multiple third probabilities, wherein the third setting function is a nonlinear mapping function; the first video frame is determined by at least one of the following The single-frame detection result of The detection result obtains the single-frame detection result of the first video frame; in response to the ratio of the third lower probability that is smaller than the seventh set threshold in the plurality of third probabilities being greater than the fourth set ratio, according to each of the third probabilities The single-frame detection result of the first video frame is obtained from the sub-single-frame detection result corresponding to the three lower probability, wherein the sixth set threshold is greater than the seventh set threshold.
- the second authenticity classification network includes authenticity classification networks with various structures; the second authenticity classification network is based on video frames included in the first video stream. and the relationship between frames, performing authenticity detection on the first video stream to obtain a video stream detection result of the first video stream, including: through the authenticity classification network of various structures, according to the first video stream The video frames and the relationship between the frames contained in the stream, perform authenticity detection on the first video stream, and obtain multiple sub-video stream detection results; through the fourth setting function, the multiple sub-video stream detection results are respectively mapped to the first video stream.
- the fourth setting function is a nonlinear mapping function
- the video stream detection result of the first video stream is determined by at least one of the following: in response to the plurality of Among the fourth probabilities, the ratio of the fourth upper probability greater than the eighth preset threshold is greater than the fifth preset ratio, and the video stream of the first video stream is obtained according to the sub-video stream detection results corresponding to each of the fourth upper probabilities detection result; in response to the ratio of the fourth lower probability that is smaller than the ninth set threshold in the plurality of fourth probabilities being greater than the sixth set ratio, obtain the sub-video stream detection results corresponding to each of the fourth lower probabilities The video stream detection result of the first video stream, wherein the eighth set threshold is greater than the ninth set threshold.
- the single-frame detection result of the first video frame indicates whether the face image in the first video frame is a face-changing image; the video stream detection result of the first video stream Indicates whether the face image in the first video stream is a face-changing image; the authenticity determination result of the video to be processed indicates whether the to-be-processed video is a face-changing video.
- a video detection apparatus includes: a first acquisition unit configured to acquire a plurality of first video frames in a video to be processed, and a first video frame corresponding to the video to be processed a video stream; a second acquisition unit for acquiring a single-frame detection result of performing authenticity detection on each of the first video frames; a third acquiring unit for acquiring a video for performing authenticity detection on the first video stream Stream detection result; a determining unit, configured to determine the authenticity discrimination result of the video to be processed according to the respective single frame detection results of the plurality of first video frames and the video stream detection result of the first video stream.
- an electronic device comprising a memory and a processor, the memory for storing computer instructions executable on the processor, the processor for when executing the computer instructions.
- a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the video detection method described in any embodiment of the present disclosure.
- a computer program includes computer-readable code, when the computer-readable code is executed in an electronic device, a processor in the electronic device executes the video detection method for implementing any embodiment of the present disclosure.
- the authenticity detection of multiple first video frames in the video to be processed and the first video stream corresponding to the video to be processed is performed simultaneously, so as to obtain the single-frame detection result of the first video frame and the first video respectively.
- the video stream detection result of the stream, and the authenticity discrimination result of the to-be-processed video is determined according to the respective single-frame detection results of the multiple first video frames and the video stream detection results of the first video stream, so that the Some forged video frames in the video to be processed are detected to improve the video detection accuracy.
- FIG. 1 is a flowchart of a video detection method shown in at least one embodiment of the present disclosure
- FIG. 2 is a schematic diagram of a video detection method shown in at least one embodiment of the present disclosure
- FIG. 3 is a schematic diagram of a video detection apparatus shown in at least one embodiment of the present disclosure.
- FIG. 4 is a schematic structural diagram of an electronic device shown in at least one embodiment of the present disclosure.
- Embodiments of the present disclosure may be applied to computer systems/servers that are operable with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known computing systems, environments and/or configurations suitable for use with computer systems/servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, Microprocessor systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, among others.
- FIG. 1 is a flowchart of a video detection method according to at least one embodiment of the present disclosure. As shown in FIG. 1 , the method includes steps 101 to 104 .
- step 101 a plurality of first video frames in the video to be processed and a first video stream corresponding to the video to be processed are acquired.
- the plurality of first video frames may be video frames corresponding to the original video sequence included in the video to be processed, or may be video frames obtained by performing frame extraction processing on the original video sequence .
- the first video stream corresponding to the video to be processed may be a video stream formed by an original video sequence contained in the video to be processed, or may be a video frame obtained by performing frame extraction processing on the original video sequence.
- the video stream is, for example, a video stream formed by the plurality of first video frames.
- step 102 a single-frame detection result of performing authenticity detection on each of the first video frames is acquired.
- the authenticity detection of the first video frame may be performed by using a first authenticity classification network to obtain a single-frame detection result of the first video frame, wherein the single-frame detection result is used for Indicating the confidence that the first video frame is fake, for example, the single-frame detection result includes a single-frame confidence score.
- the first authenticity classification network may be a pre-trained authenticity classification network for independent detection of video frames, such as ResNet (Residual Neural Network, residual network), DenseNet (Densely Connected Convolutional Networks, density Connect Convolutional Network), EfficientNet, Xception, SENet (Squeeze-and-Excitation Network, compression and excitation network) and so on.
- ResNet Residual Neural Network, residual network
- DenseNet DenseNet (Densely Connected Convolutional Networks, density Connect Convolutional Network)
- EfficientNet Xception
- SENet Seeze-and-Excitation Network, compression and excitation network
- step 103 a video stream detection result of performing authenticity detection on the first video stream is obtained.
- the second authenticity classification network may be used to perform authenticity detection on the first video stream according to the frame sequence corresponding to the first video stream and the relationship between frames, to obtain the first video stream.
- the video stream detection result of the video stream wherein the video stream detection result is used to represent the confidence that the first video stream is forged, for example, the video stream detection result includes a video stream confidence score.
- the second authenticity classification network may be a pre-trained authenticity classification network that detects video streams and considers the relationship between frames, such as C3D (3D ConvNets, 3D convolution) network, SlowFast network, X3D network (Extensible 3D, Extensible 3D) network and so on.
- C3D 3D ConvNets, 3D convolution
- SlowFast SlowFast
- X3D network Extensible 3D, Extensible 3D
- step 104 according to the respective single frame detection results of the plurality of first video frames and the video stream detection results of the first video stream, determine the authenticity determination result of the video to be processed.
- the detection result and the video stream detection result of the first video stream, and the authenticity of the video to be processed is determined according to the respective single-frame detection results of the plurality of first video frames and the video stream detection result of the first video stream.
- the false discrimination result can be detected, so that some forged video frames existing in the video to be processed can be detected, and the video detection accuracy can be improved.
- frame extraction processing may be performed on the to-be-processed video with a set frame number span to obtain the plurality of first video frames.
- the set frame number span may be determined according to the frame number of the video to be processed.
- the set frame number span may be positively correlated with the total number of video frames included in the to-be-processed video, so that The adaptive setting of the set frame number span according to the frame number of the video to be processed is realized, so that a reasonable number of first video frames can be extracted to improve the effect of video detection.
- the frame extraction process can be performed with a span of 2 as the frame number, that is, 1 frame is extracted every 2 frames.
- the single-frame detection results of the multiple first video frames may be first fused to obtain a fusion result, and then the video to be processed is determined according to the fusion result and the video stream detection result. Authenticity judgment results.
- the fusion result reflects the influence of each single-frame detection result, and then the fusion result and the video stream detection result are used to determine the video to be processed. The result of authenticity discrimination can improve the effect of video detection.
- the fusion result may be obtained by fusing the detection results of the single frames of the first video frames in the following manner.
- the single-frame detection results of the multiple first video frames are grouped to obtain a plurality of result groups respectively including one or more single-frame detection results; the average detection results of each of the result groups are obtained.
- the average detection result for each group may include the average confidence score for multiple frames within the group.
- the plurality of first video frames can be divided into N groups.
- M and N are positive integers.
- Those skilled in the art should understand that in the case where the total number of the multiple first video frames is not an integer multiple of M, there may be groups in which the number of the first video frames is not M.
- every 5 adjacent first video frames may be grouped, so that the plurality of first video frames in the video to be processed are divided into 6 groups.
- the average detection result of each of the result groups is mapped to a first probability through a first setting function to obtain a plurality of the first probabilities, wherein the first setting function is a nonlinear mapping function.
- the first setting function may be, for example, a normalized exponential Softmax function, through which the average single-frame confidence score of each group is mapped to the first probability.
- the single-frame detection result of the first video frame is a logical output value in the (- ⁇ , + ⁇ ) interval.
- the average detection result of each group is mapped to the first probability in the [0,1] interval by the Softmax function, which can reflect the distribution of the average detection result of each group.
- a fusion result is obtained according to the average detection result of each of the result groups and the plurality of first probabilities.
- the fusion result can be obtained by the following method: in response to the ratio of the first upper probabilities greater than the first set threshold in the plurality of first probabilities being greater than the first set ratio, according to each of the first upper probabilities
- the average detection result of the result group corresponding to the probability obtains the fusion result. That is, when the first upper probability exceeding the first set ratio is greater than the first set threshold, the fusion result is calculated according to the average detection results of the result groups corresponding to the first lower probability, respectively, For example, take the average of these average detection results as the fusion result.
- the fusion result when the first set threshold is 0.85 and the first set ratio is 0.7, when the ratio of the first upper probability greater than 0.85 exceeds 0.7, the corresponding first upper probability will be The average of the average detection results of the above-mentioned result groups is used as the fusion result.
- the few lower group detection results may be the result of misjudgment by the neural network.
- the fusion result may be obtained by the following method: in response to the ratio of the first lower probability that is smaller than the second set threshold in the plurality of first probabilities being greater than the second set ratio, according to each of the first probabilities
- the fusion result is obtained by averaging the detection results of the result groups corresponding to the lower probability. That is, when the first lower probabilities exceeding the second set ratio are all smaller than the second set threshold, the fusion result is calculated according to the average detection results of the result groups corresponding to these first lower probabilities, respectively, For example, take the average of these average detection results as the fusion result.
- the first set threshold is greater than the second set threshold.
- the first set ratio and the second set ratio may be the same or different, which is not limited in this embodiment of the present disclosure.
- the second set threshold is 0.15 and the second set ratio is 0.7
- the ratio of the first lower probability less than 0.15 exceeds 0.7
- the corresponding first lower probability is set to The average of the average detection results of the above-mentioned result groups.
- the few higher group detection results may be the result of misjudgment by the neural network.
- the influence of the misjudgment of the neural network on the video detection result can be reduced.
- the proportion of the first upper probability that is greater than the first set threshold is less than or equal to the first set proportion
- the proportion of the first lower probability that is less than the second set threshold is less than or equal to the second set proportion
- the fusion result may be obtained according to the respective single-frame detection results of the plurality of first video frames.
- the average value of the respective single-frame detection results of the plurality of first video frames may be used as the fusion result.
- the fusion result is calculated based on the single-frame detection results of each first video frame. The contribution of each first video frame to the final authenticity discrimination result is the same.
- a weighted average result of the fusion result and the video stream detection result may be obtained through weighted average, and the weighted average result is determined according to the weighted average result. Describe the authenticity discrimination result of the video to be processed.
- the weighted average result may be compared with a set discrimination threshold, and when the weighted average result is less than the set discrimination threshold, it is determined that the video to be processed is true, that is, it is determined that the to-be-processed video is true.
- the processed video is not a fake video; when the weighted average value is greater than or equal to the set discrimination threshold, it is determined that the to-be-processed video is a fake video.
- the multiple human faces may be fused to obtain a single-frame detection result of the corresponding first video frame.
- the face detection frame can be obtained by performing face detection on the first video frame by using a face detection network, such as RetinaFace; for the video frames after the first video frame on which face detection has been carried out, it can be obtained through the human face detection.
- Face tracking networks such as the Siamese network, track faces to obtain face detection boxes.
- a corresponding face detection frame can be generated for each face, the face detection frame has a corresponding frame number, and the face detection frame can be The corresponding face numbers are marked to distinguish multiple faces included in the first video. For example, in the case where the first video frame includes 3 faces, face detection frames with frame numbers A, B, and C are respectively generated, and the face detection frames A, B, and C are marked with faces, respectively Number 1, 2, 3.
- the face detection frame includes the coordinate information of the four vertices of the face detection frame or the length and height information of the face detection frame.
- the single-person detection result of the corresponding face is determined.
- a single-person detection result of the face corresponding to the face detection frame can be obtained.
- the single-person detection results of faces 1, 2, and 3 can be obtained respectively.
- an input tensor of [face number, frame number, height, width, channel] can be generated, so that the multiple faces in the to-be-processed video can be
- the face numbers are concatenated into a video frame set, so that each face in the video to be processed can be detected individually, and the single-person detection result corresponding to each face number can be obtained.
- the single-person detection results of each of the faces are mapped to second probabilities through a second setting function to obtain a plurality of the second probabilities, wherein the second setting function is a nonlinear mapping function.
- the single-person detection result of each face can be mapped to the second probability in the [0,1] interval through the Softmax function. Probability to reflect the distribution of single-person detection results of multiple faces contained in the video to be processed.
- a single-frame detection result of the first video frame is obtained according to the single-person detection results of each of the faces and a plurality of second probabilities.
- the individual detection of each face in the video to be processed can be realized, and each face can be more accurately evaluated.
- the influence of the corresponding single-person detection result on the authenticity discrimination result of the video to be processed can improve the accuracy of video detection.
- the fusion result of multiple faces may be obtained by the following method: in response to the existence of a second probability greater than a third set threshold in the multiple second probabilities, or the multiple first video frames of the first video frame The second probability is greater than the fourth set threshold, and the maximum value among the single-person detection results of the first video frame is acquired as the single-frame detection result of the first video frame.
- the third set threshold is greater than the fourth set threshold.
- the third set threshold is 0.9 and the fourth set threshold is 0.6
- the fourth set threshold is 0.6
- the maximum value in the single-person confidence score in the first video frame is taken as the single-frame detection result of the frame.
- the fusion result of multiple faces may be obtained by the following method: in response to the multiple second probabilities being less than the fifth set threshold, obtaining the smallest one among the single-person detection results of the first video frame value as the single frame detection result of the first video frame. That is, when the second probability corresponding to all faces in the first video frame is smaller than the fifth set threshold, it indicates that the confidence of each face detection result in the first video frame is low, then the The smallest single-person detection result in the first video frame is used as the single-frame detection result of the first video frame, so that the entire first video frame has a lower single-frame detection result.
- the fourth set threshold is greater than the fifth set threshold.
- the fifth set threshold is 0.4
- the minimum value of the single-person confidence scores in the first video frame is taken as the frame single-frame detection results.
- the single-person detection results corresponding to each face are obtained by acquiring the single-person detection results of the multiple faces, and the single-person detection results of the multiple faces are fused to obtain
- the single-frame detection result of the first video frame makes the authenticity discrimination result of the video take into account the influence of the detection results of different faces, and improves the video detection effect.
- the first authenticity classification network includes authenticity classification networks with multiple structures, and authenticity detection is performed on the first video frame through the authenticity classification networks with multiple structures to obtain a plurality of subcategories.
- the single-frame detection result is equivalent to obtaining the sub-single-frame detection results for the authenticity detection of the first video frame using multiple methods, and can be obtained by fusing the multiple sub-single-frame detection results corresponding to the first video frame. The single-frame detection result of the first video frame.
- the detection results of multiple sub-single frames corresponding to the first video frame may be fused by the following method.
- the plurality of sub-single frame detection results are respectively mapped to third probabilities through a third setting function to obtain a plurality of third probabilities.
- each sub-single frame detection result can be mapped to the third probability in the [0,1] interval through the Softmax function, with It reflects the distribution of sub-single frame detection results obtained by multiple authenticity classification methods.
- a single-frame detection result is obtained according to the multiple sub-single-frame detection results and the multiple third probabilities.
- the detection in response to the ratio of the third upper probabilities that are greater than the sixth preset threshold in the plurality of third probabilities being greater than the third preset ratio, the detection is performed according to the sub-single frame corresponding to each of the third upper probabilities.
- a single-frame detection result of the first video frame is obtained. That is, in the case where the third upper probabilities exceeding the third set ratio are all greater than the sixth set threshold, then according to the sub-single frame detection results corresponding to these third upper probabilities respectively, calculate the value of the first video frame.
- the single-frame detection result for example, the average value of these sub-single-frame detection results is taken as the single-frame detection result.
- the sixth set threshold is 0.8 and the third set ratio is 0.7
- the ratio of the third upper probability greater than 0.8 exceeds 0.7
- the average of the single-frame confidence scores is used as the single-frame detection result.
- the detection result obtains the single-frame detection result of the first video frame. That is, in the case that the third lower probabilities exceeding the fourth preset ratio are all smaller than the seventh preset threshold, then according to the sub-single frame detection results corresponding to these third lower probabilities respectively, calculate the value of the first video frame. For single frame detection results, for example, the average of these sub-single frame detection results is used as the fusion result. Wherein, the sixth set threshold is greater than the seventh set threshold.
- the third set ratio and the fourth set ratio may be the same or different, which is not limited in this embodiment of the present disclosure.
- the seventh set threshold is 0.2 and the fourth set ratio is 0.7
- the sub-probability corresponding to each of the third lower probability is set to The average of the single-frame confidence scores is used as the single-frame detection result.
- the detection results of multiple sub-single frames are low, the detection results of a few higher sub-single frames may be the result of misjudgment by the authenticity classification network of the corresponding structure.
- the influence of the false judgment of the authenticity classification network on the video detection results can be reduced.
- the second authenticity classification network includes authenticity classification networks with multiple structures, and the authenticity detection is performed on the first video frame stream through the authenticity classification networks with multiple structures to obtain multiple authenticity classification networks.
- sub-video stream detection results which is equivalent to obtaining sub-video stream detection results that use multiple methods to perform authenticity detection on the first video frame stream. The video stream detection result of the video stream.
- the detection results of multiple sub-video streams corresponding to the first video stream may be fused by the following method.
- the plurality of sub-video stream detection results are respectively mapped to fourth probabilities through a fourth setting function to obtain a plurality of fourth probabilities.
- each sub-video stream detection result can be mapped to the fourth probability in the [0, 1] interval through the Softmax function, with It reflects the distribution of sub-video stream detection results obtained by multiple authenticity classification methods.
- a video stream detection result of the first video stream is obtained according to the multiple sub-video stream detection results and the fourth probability.
- the detection is performed according to the sub-video stream corresponding to each of the fourth upper probabilities.
- the video stream detection result of the first video stream is obtained. That is, in the case that the fourth upper probabilities exceeding the fifth set ratio are all greater than the eighth set threshold, then according to the sub-video stream detection results corresponding to these fourth upper probabilities, the For the video stream detection result, for example, the average value of these sub-video stream detection results is used as the video stream detection result of the first video stream.
- the eighth set threshold is 0.8 and the fifth set ratio is 0.7
- the ratio of the fourth upper probability greater than 0.8 exceeds 0.7
- the average value of the video stream confidence scores is used as the video stream detection result of the first video stream.
- the sub-video stream corresponding to each of the fourth lower probabilities obtains the video stream detection result of the first video stream. That is, when the fourth lower probabilities exceeding the sixth set ratio are all smaller than the ninth set threshold, the first video stream is calculated according to the sub-video stream detection results corresponding to these fourth lower probabilities respectively. For example, the average value of these sub-video stream detection results is used as the video stream detection result of the first video stream.
- the eighth set threshold is greater than the ninth set threshold.
- the fifth set ratio and the sixth set ratio may be the same or different, which is not limited in this embodiment of the present disclosure.
- the ninth set threshold is 0.2 and the sixth set ratio is 0.7
- the ratio of the fourth lower probability less than 0.2 exceeds 0.7
- the average value of the video stream confidence scores is used as the video stream detection result of the first video stream.
- the detection results of multiple sub-streams are low, the detection results of a few higher sub-streams may be the result of misjudgment by the authenticity classification network of the corresponding structure.
- the influence of the false judgment of the authenticity classification network on the video detection results can be reduced.
- each set threshold and each set ratio may be determined according to the accuracy requirements of the video detection result, which are not limited herein.
- multiple first video frames in the video to be processed may be fused not only for multiple faces, but also for sub-single frame detection results obtained by multiple methods. The results are weighted and averaged to obtain the final single-frame detection result.
- FIG. 2 shows a schematic diagram of a video detection method according to at least one embodiment of the present disclosure.
- a plurality of first video frames in the video to be processed and a first video stream formed by the plurality of first video frames are acquired.
- the first video frame is processed to obtain a single-frame detection result of the first video frame.
- the authenticity detection is performed on the multiple faces contained in the first video frame, and the single-person detection results corresponding to each face are fused to obtain a face fusion result.
- Authenticity detection is performed on the first video frame, and the sub-single frame detection results corresponding to various methods are fused to obtain the method fusion result, and the weighted average of the face fusion result and the method fusion result is performed to obtain the result.
- the fusion results corresponding to the plurality of first video frames are obtained by fusing the respective single-frame detection results of the plurality of first video frames.
- the first video stream is processed to obtain a video stream detection result of the first video stream.
- the authenticity detection of the first video stream can be performed by various methods, and the sub-video detection results corresponding to the various methods are fused to obtain the video stream detection result.
- the authenticity discrimination result of the video to be processed is obtained in combination with multiple fusion methods. For videos that contain both real video frames and fake video frames, and videos that contain both real faces and fake faces, you can perform Effective authenticity detection to obtain video detection results with high accuracy.
- the authenticity detection performed on the first video frame may be face-swap detection, and the obtained single-frame detection result indicates whether the face image in the first video frame is a face-swap face image test results. For example, the higher the score included in the detection result, the higher the confidence that the face image in the first video frame is a face-changing face image.
- the authenticity detection performed on the first video stream may also be a face-changing detection, and the obtained video stream detection result is a detection indicating whether the face image in the first video stream is a face-changing face image. result. According to the respective single-frame detection results of the multiple first video frames and the video stream detection results of the first video stream, a determination result of whether the to-be-processed video is a face-changing video can be obtained.
- FIG. 3 shows a schematic diagram of a video detection apparatus according to an embodiment of the present disclosure.
- the device includes a first obtaining unit 301 for obtaining a plurality of first video frames in a video to be processed, and a first video stream corresponding to the video to be processed; a second obtaining unit 302, for acquiring a single frame detection result of performing authenticity detection on each of the first video frames; a third acquiring unit 303 for acquiring a video stream detection result for performing authenticity detection on the first video stream; determining unit 304 is used to determine the authenticity discrimination result of the video to be processed according to the respective single frame detection results of the multiple first video frames and the video stream detection results of the first video stream.
- the first obtaining unit is specifically configured to: perform frame extraction processing on the video to be processed with a set frame number span to obtain the plurality of first video frames, wherein the set frame The number span is positively related to the total number of video frames contained in the video to be processed.
- the second obtaining unit is specifically configured to: perform authenticity detection on each of the first video frames through a first authenticity classification network, and obtain a single-frame detection result of each of the first video frames, The single-frame detection result is used to represent the confidence that the first video frame is forged.
- the second obtaining unit is specifically configured to: through the second authenticity classification network, according to the video frames included in each of the first video streams and the relationship between the frames, classify each of the first video streams Authenticity detection is performed to obtain video stream detection results of each of the first video streams, wherein the video stream detection results are used to represent the confidence that the first video stream is forged.
- the determining unit is specifically configured to: fuse the respective single-frame detection results of the multiple first video frames to obtain a fusion result; determine according to the fusion result and the video stream detection result The authenticity discrimination result of the video to be processed.
- the determining unit when the determining unit is configured to fuse the respective single-frame detection results of the multiple first video frames to obtain a fusion result, the determining unit is specifically configured to: merging the respective first video frames of the multiple first video frames The single-frame detection results are grouped to obtain multiple result groups including one or more single-frame detection results; the average detection results of each of the result groups are obtained; the average of each of the result groups is calculated by the first setting function The detection result is mapped to a first probability, and a plurality of the first probabilities are obtained, wherein the first setting function is a nonlinear mapping function; according to the average detection result of each of the result groups and the plurality of first probabilities , to get the fusion result.
- the determining unit when the determining unit is configured to obtain a fusion result according to the average detection result of each of the result groups and the multiple first probabilities, the determining unit is specifically configured to: respond to the multiple first probabilities The ratio of the first upper probabilities greater than the first set threshold is greater than the first set ratio, and the fusion result is obtained according to the average detection result of the result group corresponding to each of the first upper probabilities; and/or, in response to the The ratio of the first lower probability that is smaller than the second preset threshold in the plurality of first probabilities is greater than the second preset ratio, and the fusion result is obtained according to the average detection result of the result group corresponding to each of the first lower probabilities;
- the first set threshold is greater than the second set threshold.
- the determining unit when the determining unit is configured to determine the authenticity discrimination result of the video to be processed according to the fusion result and the video stream detection result, the determining unit is specifically configured to: compare the fusion result and the video stream detection result.
- the video stream detection result is weighted and averaged, and the authenticity judgment result of the video to be processed is determined according to the obtained weighted average result.
- the first video frame includes multiple faces;
- the second obtaining unit is specifically configured to: obtain face detection frames corresponding to multiple faces in the first video frame;
- the image area corresponding to the detection frame is determined, and the single-person detection result of the corresponding face is determined;
- the single-person detection result of each face is mapped to the second probability through the second setting function, and a plurality of the second probabilities are obtained, wherein all the The second setting function is a nonlinear mapping function; according to the single-person detection results of each of the human faces and the plurality of second probabilities, the single-frame detection results of the first video frame are obtained.
- the first authenticity classification network includes authenticity classification networks with multiple structures
- the second obtaining unit is configured to perform authenticity classification on the first video frame through the first authenticity classification network.
- detection when the single-frame detection result of the first video frame is obtained, it is specifically used for: performing authenticity detection on the first video frame through the authenticity classification network of the various structures, and obtaining multiple sub-single-frame detection results ;
- the multiple sub-single frame detection results are respectively mapped to the third probability by the third setting function to obtain multiple third probabilities, wherein the third setting function is a nonlinear mapping function;
- the proportion of the third upper probability greater than the sixth preset threshold is greater than the third preset proportion, and the single frame of the first video frame is obtained according to the sub-single frame detection results corresponding to each of the third upper probabilities.
- the single-frame detection result of the first video frame is obtained from the sub-single-frame detection result, wherein the sixth set threshold is greater than the seventh set threshold.
- the second authenticity classification network includes authenticity classification networks with multiple structures
- the third acquisition unit is configured to pass the second authenticity classification network according to the content of the first video stream.
- the authenticity detection is performed on the first video stream, and when the video stream detection result of the first video stream is obtained, it is specifically used for: through the authenticity classification network of the various structures, According to the video frames included in the first video stream and the relationship between the frames, the authenticity detection is performed on the first video stream to obtain multiple sub-video stream detection results;
- the detection results are respectively mapped to fourth probabilities, and a plurality of the fourth probabilities are obtained, wherein the fourth setting function is a nonlinear mapping function;
- the ratio of the four-up probability is greater than the fifth set ratio, and the video stream detection result of the first video stream is obtained according to the sub-video stream detection results corresponding to each of the fourth probability up-probabilities; and/or, in response to the The ratio of the fourth lower probability that is smaller than the ninth preset threshold value among the
- the single-frame detection result indicates whether the face image in the first video frame is a face-changing image; the video stream detection result of the first video stream indicates that the Whether the face image of the to-be-processed video is a face-changing image; the authenticity determination result of the to-be-processed video indicates whether the to-be-processed video is a face-changing video.
- FIG. 4 provides an electronic device according to at least one embodiment of the present disclosure, the device includes a memory and a processor, where the memory is used for storing computer instructions that can be executed on the processor, and the processor is used for executing the computer instructions
- the video detection method described in any implementation manner of the present disclosure is implemented at the same time.
- At least one embodiment of the present disclosure further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the video detection method described in any implementation manner of the present disclosure.
- one or more embodiments of this specification may be provided as a method, system or computer program product. Accordingly, one or more embodiments of this specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present specification may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein form of the product.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- Embodiments of the subject matter and functional operations described in this specification can be implemented in digital electronic circuitry, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this specification and their structural equivalents, or in a combination of one or more.
- Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
- the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
- the processing device executes.
- the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
- the processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
- the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
- Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
- the central processing unit will receive instructions and data from read only memory and/or random access memory.
- the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both.
- the computer does not have to have such a device.
- the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
- PDA personal digital assistant
- GPS global positioning system
- USB universal serial bus
- Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or flash memory devices). removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks.
- semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
- magnetic disks eg, internal hard disks or flash memory devices. removable disks
- magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
- the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims (17)
- 一种视频检测方法,包括:获取待处理视频中的多个第一视频帧,以及所述待处理视频所对应的第一视频流;获取对各所述第一视频帧进行真伪检测的单帧检测结果;获取对所述第一视频流进行真伪检测的视频流检测结果;根据所述多个第一视频帧各自的单帧检测结果和所述第一视频流的视频流检测结果,确定所述待处理视频的真伪判别结果。
- 根据权利要求1所述的方法,其特征在于,所述获取待处理视频中的多个第一视频帧,包括:以设定帧数跨度对所述待处理视频进行抽帧处理,得到所述多个第一视频帧,其中,所述设定帧数跨度与所述待处理视频所包含的视频帧的总帧数呈正相关。
- 根据权利要求1或2所述的方法,其特征在于,所述获取对各所述第一视频帧进行真伪检测的单帧检测结果,包括:通过第一真伪分类网络对所述第一视频帧进行真伪检测,得到所述第一视频帧的单帧检测结果,其中,所述单帧检测结果用于表征所述第一视频帧是伪造的置信度。
- 根据权利要求1至3任一项所述的方法,其特征在于,所述获取对所述第一视频流进行真伪检测的视频流检测结果,包括:通过第二真伪分类网络,根据所述第一视频流所包含的视频帧以及帧间关系,对所述第一视频流进行真伪检测,得到所述第一视频流的视频流检测结果,其中,所述视频流检测结果用于表征所述第一视频流是伪造的置信度。
- 根据权利要求1至4任一项所述的方法,其特征在于,所述根据所述多个第一视频帧各自的单帧检测结果和所述第一视频流的视频流检测结果,确定所述待处理视频的真伪判别结果,包括:对所述多个第一视频帧各自的单帧检测结果进行融合,得到融合结果;根据所述融合结果和所述视频流检测结果,确定所述待处理视频的真伪判别结果。
- 根据权利要求5所述的方法,其特征在于,所述对所述多个第一视频帧各自的单帧检测结果进行融合,得到融合结果,包括:对所述多个第一视频帧各自的单帧检测结果进行分组,得到分别包括一个或多个单帧检测结果的多个结果组;获得各所述结果组的平均检测结果;通过第一设定函数将各所述结果组的平均检测结果映射为第一概率,得到多个所述第一概率,其中,所述第一设定函数为非线性映射函数;根据各所述结果组的平均检测结果以及所述多个第一概率,得到融合结果。
- 根据权利要求6所述的方法,其特征在于,所述根据各所述结果组的平均检测结果以及所述多个第一概率,得到融合结果,包括以下中至少一个:响应于所述多个第一概率中大于第一设定阈值的第一上概率的比例大于第一设定比例,根据各所述第一上概率所对应的所述结果组的平均检测结果得到融合结果;响应于所述多个第一概率中小于第二设定阈值的第一下概率的比例大于第二设定比例,根据各所述第一下概率所对应的所述结果组的平均检测结果得到融合结果;其中,所述第一设定阈值大于所述第二设定阈值。
- 根据权利要求5至7任一项所述的方法,其特征在于,所述根据所述融合结果和所述视频流检测结果,确定所述待处理视频的真伪判别结果,包括:对所述融合结果和所述视频流检测结果进行加权平均,得到加权平均结果;根据所得到的所述加权平均结果确定所述待处理视频的真伪判别结果。
- 根据权利要求1至8任一项所述的方法,其特征在于,所述第一视频帧包括多个人脸;所述获取对各所述第一视频帧进行真伪检测的单帧检测结果,包括:获取所述第一视频帧中多个人脸对应的人脸检测框;根据各所述人脸检测框对应的图像区域,确定相应人脸的单人检测结果;通过第二设定函数将各个所述人脸的单人检测结果映射为第二概率,得到多个所述第二概率,其中,所述第二设定函数为非线性映射函数;根据各个所述人脸的单人检测结果以及所述多个第二概率,得到所述第一视频帧的单帧检测结果。
- 根据权利要求9所述的方法,其特征在于,所述根据各个所述人脸的单人检测结果以及所述多个第二概率,得到所述第一视频帧的单帧检测结果,包括以下至少一个:响应于所述多个第二概率中存在大于第三设定阈值的第二概率,获取所述第一视频帧中最大的单人检测结果作为所述第一视频帧的单帧检测结果;响应于所述多个第二概率均大于第四设定阈值,获取所述第一视频帧中最大的单人检测结果作为所述第一视频帧的单帧检测结果;响应于所述多个第二概率均小于第五设定阈值,获取所述第一视频帧中最小的单人检测结果作为所述第一视频帧的单帧检测结果;其中,所述第三设定阈值大于所述第四设定阈值,所述第四设定阈值大于所第五设 定阈值。
- 根据权利要求3所述的方法,其特征在于,所述第一真伪分类网络包括多种结构的真伪分类网络;所述通过第一真伪分类网络对所述第一视频帧进行真伪检测,得到所述第一视频帧的单帧检测结果,包括:通过所述多种结构的真伪分类网络对所述第一视频帧进行真伪检测,获得多个子单帧检测结果;通过第三设定函数将所述多个子单帧检测结果分别映射为第三概率,得到多个第三概率,其中,所述第三设定函数为非线性映射函数;通过以下至少一个确定所述第一视频帧的单帧检测结果:响应于所述多个第三概率中大于第六设定阈值的第三上概率的比例大于第三设定比例,根据各所述第三上概率所对应的子单帧检测结果得到所述第一视频帧的单帧检测结果;响应于所述多个第三概率中小于第七设定阈值的第三下概率的比例大于第四设定比例,根据各所述第三下概率所对应的子单帧检测结果得到所述第一视频帧的单帧检测结果,其中,所述第六设定阈值大于所述第七设定阈值。
- 根据权利要求4所述的方法,其特征在于,所述第二真伪分类网络包括多种结构的真伪分类网络;所述通过第二真伪分类网络,根据所述第一视频流所包含的视频帧以及帧间关系,对所述第一视频流进行真伪检测,得到所述第一视频流的视频流检测结果,包括:通过所述多种结构的真伪分类网络,根据所述第一视频流所包含的视频帧以及帧间关系,对所述第一视频流进行真伪检测,获得多个子视频流检测结果;通过第四设定函数将所述多个子视频流检测结果分别映射为第四概率,得到多个所述第四概率,其中,所述第四设定函数为非线性映射函数;通过以下至少一个确定所述第一视频流的视频流检测结果:响应于所述多个第四概率中大于第八设定阈值的第四上概率的比例大于第五设定比例,根据各所述第四上概率所对应的子视频流检测结果得到所述第一视频流的视频流检测结果;响应于所述多个第四概率中小于第九设定阈值的第四下概率的比例大于第六设定比例,根据各所述第四下概率所对应的子视频流检测结果得到所述第一视频流的视频流检测结果,其中,所述第八设定阈值大于所述第九设定阈值。
- 根据权利要求1至12任一项所述的方法,其特征在于,所述第一视频帧的单 帧检测结果指示所述第一视频帧中的脸部图像是否为换脸图像;所述第一视频流的视频流检测结果指示所述第一视频流中的脸部图像是否为换脸图像;所述待处理视频的真伪判别结果指示所述待处理视频是否为换脸视频。
- 一种视频检测装置,包括:第一获取单元,用于获取待处理视频中的多个第一视频帧,以及所述待处理视频所对应的第一视频流;第二获取单元,用于获取对各所述第一视频帧进行真伪检测的单帧检测结果;第三获取单元,用于获取对所述第一视频流进行真伪检测的视频流检测结果;确定单元,用于根据所述多个第一视频帧各自的单帧检测结果和所述第一视频流的视频流检测结果,确定所述待处理视频的真伪判别结果。
- 一种电子设备,其特征在于,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求1至13任一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至13任一项所述的方法。
- 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至13任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022531515A JP2023507898A (ja) | 2020-11-27 | 2021-06-30 | ビデオ検出方法、装置、機器及びコンピュータ可読記憶媒体 |
KR1020227018065A KR20220093157A (ko) | 2020-11-27 | 2021-06-30 | 비디오 검출 방법, 장치, 기기 및 컴퓨터 판독 가능한 저장 매체 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011365074.7A CN112329730B (zh) | 2020-11-27 | 2020-11-27 | 视频检测方法、装置、设备及计算机可读存储介质 |
CN202011365074.7 | 2020-11-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022110806A1 true WO2022110806A1 (zh) | 2022-06-02 |
Family
ID=74309312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/103766 WO2022110806A1 (zh) | 2020-11-27 | 2021-06-30 | 视频检测方法、装置、设备及计算机可读存储介质 |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP2023507898A (zh) |
KR (1) | KR20220093157A (zh) |
CN (1) | CN112329730B (zh) |
WO (1) | WO2022110806A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118366198A (zh) * | 2024-04-23 | 2024-07-19 | 天翼爱音乐文化科技有限公司 | 一种基于多人场景的跟踪换脸方法、系统、设备及介质 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329730B (zh) * | 2020-11-27 | 2024-06-11 | 上海商汤智能科技有限公司 | 视频检测方法、装置、设备及计算机可读存储介质 |
CN113792701B (zh) * | 2021-09-24 | 2024-08-13 | 北京市商汤科技开发有限公司 | 一种活体检测方法、装置、计算机设备和存储介质 |
CN114359811A (zh) * | 2022-01-11 | 2022-04-15 | 北京百度网讯科技有限公司 | 数据鉴伪方法、装置、电子设备以及存储介质 |
CN115412726B (zh) * | 2022-09-02 | 2024-03-01 | 北京瑞莱智慧科技有限公司 | 视频真伪检测方法、装置及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150208025A1 (en) * | 2014-01-21 | 2015-07-23 | Huawei Technologies Co., Ltd. | Video Processing Method and Apparatus |
CN111444881A (zh) * | 2020-04-13 | 2020-07-24 | 中国人民解放军国防科技大学 | 伪造人脸视频检测方法和装置 |
CN111444873A (zh) * | 2020-04-02 | 2020-07-24 | 北京迈格威科技有限公司 | 视频中人物真伪的检测方法、装置、电子设备及存储介质 |
CN111967427A (zh) * | 2020-08-28 | 2020-11-20 | 广东工业大学 | 一种伪造人脸视频鉴别方法、系统和可读存储介质 |
CN112329730A (zh) * | 2020-11-27 | 2021-02-05 | 上海商汤智能科技有限公司 | 视频检测方法、装置、设备及计算机可读存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299650B (zh) * | 2018-07-27 | 2021-09-07 | 东南大学 | 基于视频的非线性在线表情预检测方法及装置 |
US10810725B1 (en) * | 2018-12-07 | 2020-10-20 | Facebook, Inc. | Automated detection of tampered images |
CN110059542A (zh) * | 2019-03-04 | 2019-07-26 | 平安科技(深圳)有限公司 | 基于改进的Resnet的人脸活体检测的方法及相关设备 |
CN113646806A (zh) * | 2019-03-22 | 2021-11-12 | 日本电气株式会社 | 图像处理设备、图像处理方法和存储程序的记录介质 |
CN111783632B (zh) * | 2020-06-29 | 2022-06-10 | 北京字节跳动网络技术有限公司 | 针对视频流的人脸检测方法、装置、电子设备及存储介质 |
-
2020
- 2020-11-27 CN CN202011365074.7A patent/CN112329730B/zh active Active
-
2021
- 2021-06-30 WO PCT/CN2021/103766 patent/WO2022110806A1/zh active Application Filing
- 2021-06-30 KR KR1020227018065A patent/KR20220093157A/ko active Search and Examination
- 2021-06-30 JP JP2022531515A patent/JP2023507898A/ja active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150208025A1 (en) * | 2014-01-21 | 2015-07-23 | Huawei Technologies Co., Ltd. | Video Processing Method and Apparatus |
CN111444873A (zh) * | 2020-04-02 | 2020-07-24 | 北京迈格威科技有限公司 | 视频中人物真伪的检测方法、装置、电子设备及存储介质 |
CN111444881A (zh) * | 2020-04-13 | 2020-07-24 | 中国人民解放军国防科技大学 | 伪造人脸视频检测方法和装置 |
CN111967427A (zh) * | 2020-08-28 | 2020-11-20 | 广东工业大学 | 一种伪造人脸视频鉴别方法、系统和可读存储介质 |
CN112329730A (zh) * | 2020-11-27 | 2021-02-05 | 上海商汤智能科技有限公司 | 视频检测方法、装置、设备及计算机可读存储介质 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118366198A (zh) * | 2024-04-23 | 2024-07-19 | 天翼爱音乐文化科技有限公司 | 一种基于多人场景的跟踪换脸方法、系统、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
JP2023507898A (ja) | 2023-02-28 |
CN112329730A (zh) | 2021-02-05 |
CN112329730B (zh) | 2024-06-11 |
KR20220093157A (ko) | 2022-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022110806A1 (zh) | 视频检测方法、装置、设备及计算机可读存储介质 | |
US20230041233A1 (en) | Image recognition method and apparatus, computing device, and computer-readable storage medium | |
CN107871130B (zh) | 图像处理 | |
CN106415594B (zh) | 用于面部验证的方法和系统 | |
WO2018121157A1 (zh) | 一种网络流量异常检测方法及装置 | |
US9485204B2 (en) | Reducing photo-tagging spam | |
CN110853033B (zh) | 基于帧间相似度的视频检测方法和装置 | |
WO2022160591A1 (zh) | 人群行为检测方法及装置、电子设备、存储介质及计算机程序产品 | |
WO2021130546A1 (en) | Target object identification system, method and apparatus, electronic device and storage medium | |
CN111160555B (zh) | 基于神经网络的处理方法、装置及电子设备 | |
Kharrazi et al. | Improving steganalysis by fusion techniques: A case study with image steganography | |
CN112468487B (zh) | 实现模型训练的方法、装置、实现节点检测的方法及装置 | |
CN111968625A (zh) | 融合文本信息的敏感音频识别模型训练方法及识别方法 | |
TW201944291A (zh) | 人臉辨識方法 | |
Niu et al. | Boundary-aware RGBD salient object detection with cross-modal feature sampling | |
US20220398400A1 (en) | Methods and apparatuses for determining object classification | |
US11295457B2 (en) | Tracking apparatus and computer readable medium | |
CN113095257A (zh) | 异常行为检测方法、装置、设备及存储介质 | |
US20230283622A1 (en) | Anomaly detection method, anomaly detection device, and recording medium | |
US8737696B2 (en) | Human face recognition method and apparatus | |
WO2023185693A1 (zh) | 图像处理方法、相关装置和系统 | |
WO2023019970A1 (zh) | 一种攻击检测方法及装置 | |
CN114513473B (zh) | 一种流量类别检测方法、装置及设备 | |
WO2023273227A1 (zh) | 指甲识别方法、装置、设备及存储介质 | |
CN103426171B (zh) | 双目立体视觉系统中对应指尖点的匹配方法、装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2022531515 Country of ref document: JP Kind code of ref document: A Ref document number: 20227018065 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21896314 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.10.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21896314 Country of ref document: EP Kind code of ref document: A1 |