WO2022188315A1

WO2022188315A1 - Video detection method and apparatus, electronic device, and storage medium

Info

Publication number: WO2022188315A1
Application number: PCT/CN2021/104572
Authority: WO
Inventors: 熊俊峰; 王洋; 周越; 张欢; 仲震宇
Original assignee: 百度在线网络技术（北京）有限公司
Priority date: 2021-03-12
Filing date: 2021-07-05
Publication date: 2022-09-15
Also published as: JP2023543015A; CN112883902A; CN112883902B; KR20230045098A

Abstract

A video detection method and apparatus, an electronic device, and a storage medium, relating to fields such as artificial intelligence, deep learning, computer vision, image processing, facial recognition, limb recognition, and counterfeit detection. The method comprises: detecting a video frame in a video data stream to obtain a target region in the video frame, the target region being used for representing a region where different video frames in the video data stream have some identical pixels (S101); searching the video data stream for an abnormal video frame that has the target region (S102); and if a detection parameter corresponding to the abnormal video frame meets a threshold, determining the abnormal video frame as a target video frame (S103). The method can detect an abnormal video frame, in a video data stream, that has been edited (or referred to as counterfeited) by humans.

Description

A video detection method, device, electronic device and storage medium

This application claims the priority of the Chinese patent application filed on March 12, 2021 with the application number 202110272132.X and the invention titled "video detection method, device, electronic device and storage medium", the entire content of which is approved by Reference is incorporated in this application.

technical field

The present disclosure relates to the field of computer processing, and in particular, the present disclosure relates to the fields of artificial intelligence, deep learning, computer vision, image processing, face recognition, limb recognition, forgery detection, and the like.

Background technique

With the development of computer technology, not only pictures and audios can be forged, but even videos can be forged. In the deep forgery processing of videos, forged pictures can be used to generate videos. For example, a piece of video content can be forged by replacing new elements (such as replacing other people's faces), so as to use various applications (such as bank customers) to achieve the purpose of forgery. terminal, access control system, etc.) to realize the attack, which will bring huge losses to users.

SUMMARY OF THE INVENTION

The present disclosure provides a video detection method, device, electronic device and storage medium.

According to an aspect of the present disclosure, a video detection method is provided, comprising:

Detecting the video frame in the video data stream to obtain a target area in the video frame, and the target area is used to represent that different video frames in the video data stream exist in some areas with completely identical pixels;

Find abnormal video frames in the target area in the video data stream;

In the case that the detection parameter corresponding to the abnormal video frame meets the threshold, the abnormal video frame is determined as the target video frame.

According to another aspect of the present disclosure, a video detection apparatus is provided, comprising:

The target area detection module is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that there are some identical pixels in different video frames in the video data stream. area;

An abnormal video search module for finding abnormal video frames in the target area in the video data stream;

A target video determination module, configured to determine the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame meets a threshold.

According to another aspect of the present disclosure, there is provided an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method provided by any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method provided by any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, including computer instructions, which when executed by a processor implement the method provided by any one of the embodiments of the present disclosure.

By adopting the present disclosure, a video frame in a video data stream can be detected, and a target area in the video frame can be obtained, and the target area is used to represent that different video frames in the video data stream have some areas with identical pixels; Find the abnormal video frame in the target area in the video data stream; in the case that the detection parameter corresponding to the abnormal video frame meets the threshold, determine the abnormal video frame as the target video frame, so that it can be detected Anomalous video frames that have been edited (or forged) in a video data stream.

It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

Description of drawings

The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present disclosure. in:

1 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure;

2 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure;

3 is a schematic diagram of a composition structure of a video detection apparatus according to an embodiment of the present disclosure;

4 is a schematic diagram of a composition structure of a video detection apparatus according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an electronic device used to implement the video detection method of an embodiment of the present disclosure.

Detailed ways

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. The term "at least one" herein refers to any combination of any one of a plurality or at least two of a plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and Any one or more elements selected from the set of C. The terms "first" and "second" herein refer to and distinguish between a plurality of similar technical terms, and do not mean to limit the order, or to limit only two meanings, for example, the first feature and the second Feature means that there are two types/two features, the first feature can be one or more, and the second feature can also be one or more.

In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present disclosure may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.

For deep forgery of video, taking the application scenario of access control system as an example, video can be collected based on mobile phone applications, access control, CCTV and other camera scenarios to obtain a video data stream. The video frames in the video data stream are based on three primary colors ( RGB) images. Even if the image can be combined with depth information (Deph) collected by structured light or binocular cameras, and multi-modal means combined with infrared and color illumination, it can perform functions such as face, limb, and motion security verification. Weaknesses on RGB are also unavoidable, making it easy to fake videos through images. Especially when the forged video is implanted at the same time when the system is intruded, the attack threat to the user is more obvious. This kind of image-based driving video can achieve deceptive attacks through fake faces, fake human limbs, fake movements, etc. Due to the good attack effect and low attack cost, various applications involve related security verification functions. That said, the danger is enormous.

In view of this, in the scheme of driving video based on images, the classifiers trained by video understanding and video image convolutional network can be used to perform related tasks on fake faces, fake human limbs, fake actions, etc. in the video data stream. However, the video image convolution network has a large number of parameters, and a large amount of data needs to be labeled before training, and the implementation cost is high; The parameter adjustment operation to improve its network performance has low efficiency and relatively few applicable scenarios; the machine running the video image convolution network is relatively sophisticated and expensive.

In the present disclosure, a target area can be reasonably selected, and the entire video data stream can be sorted out based on the target area, so as to find out which video frames in the video data stream have traces of being edited, so that the edited abnormality can be detected. Video to make the right judgment. Considering that the video driven by pictures is based on the key points to drive part of the position, there will be some pixels that are exactly the same in different video frames, even if there is a certain random noise in the video data stream, but the normal video (non-edited fake video) The possibility that the next pixel is the same may be almost equal to zero, and it can be considered that this phenomenon belongs to the traces of the editing of the video frame.

According to an embodiment of the present disclosure, a video detection method is provided. FIG. 1 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure. The method can be applied to a video detection apparatus. For example, the apparatus can be deployed in a terminal or a server. or other processing devices, processing such as target area detection, abnormal video frame search, and target video frame determination after evaluating abnormal video frames can be implemented. The terminal may be a user equipment (UE, User Equipment), a mobile device, a Personal Digital Assistant (PDA, Personal Digital Assistant), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like. In some possible implementations, the method may also be implemented by the processor invoking computer-readable instructions stored in the memory. As shown in Figure 1, it includes:

S101. Detecting a video frame in a video data stream to obtain a target area in the video frame, where the target area is used to indicate that there are some areas with identical pixels in different video frames in the video data stream.

S102. Search for abnormal video frames in the target area in the video data stream.

S103. In the case that the detection parameter corresponding to the abnormal video frame meets the threshold, determine the abnormal video frame as a target video frame.

In an example of S101-S103, the video frame in the video data stream is detected to find the target area, for example, the target area may be the area manually edited by the forger, and the area manually edited by the forger is: Different video frames in the video data stream have some regions with completely identical pixels. During the process of sorting out the entire video data stream based on the target area, abnormal video frames in the target area can be found in the video data stream. It is also possible to perform operations on multiple abnormal video frames in the video data stream to obtain detection parameters corresponding to the abnormal video frames, for example, the abnormal pixel rate of abnormal pixels in the video frame sequence, or the abnormal video composed of different pixel abnormal rates. The video detection score obtained by scoring the sequence, so that the abnormal video frame is determined as the target video frame under the condition that the detection parameter meets the threshold. For the threshold, the threshold may also be configured according to the video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold.

By adopting the present disclosure, a video frame in a video data stream can be detected, and a target area in the video frame can be obtained, and the target area is used to represent that different video frames in the video data stream have some areas with identical pixels; Find an abnormal video frame in the target area in the video data stream; and determine the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame meets the threshold. Since "there are some areas with identical pixels in different video frames in the video data stream" is an artificial editing area (or a forged area), then this video frame is abnormal, and the entire video data stream needs to be detected, first find this Class target area (that is, the human editing area). After that, the entire video combing stream is combed through the target area, and the video frames that exist in the target area are regarded as abnormal video frames. Considering the detection accuracy of the abnormal video frame, it is also necessary to finally determine the abnormal video frame as: the target video frame obtained through detection, when the detection parameter corresponding to the abnormal video frame is judged to meet the threshold value, thereby , which can accurately detect abnormal video frames that have been artificially edited (or forged) in the video data stream.

According to an embodiment of the present disclosure, a video detection method is provided. FIG. 2 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure. As shown in FIG. 2 , the method includes:

S201. Extract key regions respectively for at least two adjacent video frames in the video data stream.

S202. Perform feature comparison of pixel points on the key regions corresponding to the at least two video frames respectively, and use the key regions obtained from the comparison as the target regions that have the same part of the pixels.

S203. Search for abnormal video frames in the target area in the video data stream.

S204. In the case that the detection parameter corresponding to the abnormal video frame meets the threshold, determine the abnormal video frame as a target video frame.

In an example of S201-S204, video frames in the video data stream are detected to find the target area. Considering that driving videos based on images is to achieve deceptive attacks through fake faces, fake human limbs, fake movements, etc. Such as designated gestures, etc.) as the key area, instead of detecting all human bodies and performed actions. Wherein, the target area may be an area artificially edited by the forger, and the area artificially edited by the forger is: different video frames in the video data stream exist in some areas with identical pixels. After finding the abnormal video frames in the target area in the video data stream, operations can also be performed on multiple abnormal video frames in the video data stream to obtain detection parameters corresponding to the abnormal video frames, for example, a video frame sequence The pixel abnormality rate of abnormal pixels, or the video detection score obtained by further scoring abnormal video sequences composed of different pixel abnormality rates, so that the abnormal video frame is determined as the target video frame when the detection parameters meet the threshold. For the threshold, the threshold may also be configured according to the video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold. With this embodiment, since the detection is performed on the key area, not only the processing speed is improved, but also the detection accuracy is higher.

In one embodiment, determining the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame conforms to a threshold value includes: comparing the abnormal video frame in the video data stream by the abnormal video frame The formed first video sequence performs pixel abnormality operation to obtain the similarity score; according to the similarity score, obtains the pixel abnormality rate for the abnormal video frame; using the pixel abnormality rate as the detection parameter, in the If the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame. Using this embodiment, the similarity score can be obtained by performing an operation on pixel anomalies for the first video sequence, and the similarity score can be used as an indicator for evaluating the pixel abnormality rate. The pixel abnormality rate of the frame, thus, the pixel abnormality rate is used as a detection parameter, and when the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame. In other words, the abnormal video frame located in the first video sequence can be screened out from the entire video data stream according to the pixel abnormality rate, and finally the abnormal video frame located in the first video sequence can be determined as the target video frame, which improves the detection performance. accuracy.

In one embodiment, determining the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame conforms to a threshold value includes: comparing the abnormal video frame in the video data stream by the abnormal video frame The constituted first video sequence is subjected to pixel anomaly operation to obtain a similarity score; according to the similarity score, a pixel abnormality rate for the abnormal video frame is obtained; constitute the second video sequence, and score them respectively to obtain the corresponding video detection score; take the video detection score as the detection parameter, and in the case that the detection parameter meets the threshold, the second video Abnormal video frames present in the sequence are determined as target video frames. Using this embodiment, the similarity score can be obtained by performing an operation on pixel anomalies for the first video sequence, and the similarity score can be used as an indicator for evaluating the pixel abnormality rate. The pixel anomaly rate of the frame. Further, a second video sequence composed of different pixel abnormality rates can also be selected from the first video sequence, and scored respectively to obtain a corresponding video detection score, so as to use the video detection score as the said video detection score. A detection parameter, when the detection parameter conforms to a threshold, determine an abnormal video frame existing in the second video sequence as a target video frame. In other words, abnormal video frames located in the first video sequence can be screened out from the entire video data stream according to the pixel abnormality rate, and a second video sequence composed of different pixel abnormality rates can be selected from the first video sequence, and respectively After scoring, after obtaining the corresponding video detection score, further according to the video detection score, screen out the abnormal video frame located in the second video sequence whose video detection score meets the expectations from the first video sequence, and finally the abnormal video frame located in the second video sequence. Abnormal video frames in the second video sequence whose video detection scores meet expectations are determined as target video frames, which improves detection accuracy.

In an embodiment, the method further includes: configuring the threshold according to the video detection score. Wherein, the video detection score includes: the score ratio of the second video sequence composed of different pixel abnormality rates in the first video sequence; the first video sequence is the video data stream composed of the abnormal video A first video sequence of frames. In an example of configuring a threshold value, an operation of pixel anomalies is performed on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score, and according to the similarity score, a target for the abnormal video frame is obtained. the pixel abnormality rate of the first video sequence, select a second video sequence composed of different pixel abnormality rates from the first video sequence, and score them respectively to obtain the corresponding video detection score. The video detection score obtained by the operation configures the threshold. With this embodiment, in the process of evaluating according to the video detection score, the detection accuracy of abnormal video mainly depends on the proportion of abnormal video frames with different pixel abnormality rates in the whole video data stream, which is different from abnormal video frames. The position of the video frame is independent, which improves the detection accuracy.

In an embodiment, the method further includes: locating the position of the target video frame in the video data stream according to the threshold. For the threshold, a threshold may also be configured according to the above-mentioned video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold. With this embodiment, since the threshold is obtained through the above-mentioned video detection score configuration, the threshold itself can be used to evaluate the abnormal target video frame, so that the position of the target video frame can be directly inferred according to the threshold , which does not involve the training of neural networks. In addition, in order to be more accurate and improve the positioning efficiency, a neural network for positioning can be simply trained according to the threshold, and the position of the target video frame can be located from the video data stream according to the neural network used for positioning. Since less data is required for training, the complexity of the neural network is reduced.

Application example:

The processing flow of applying the first embodiment of the present disclosure includes the following contents:

1. Read the video data stream to obtain multiple video frames, each video frame is a frame of pictures, and multiple video frames constitute the picture stream X (i=1,2,...,n), where n is a positive value greater than 1. Integer, indicating the number of pictures.

2. Detect key areas, you can use sensors such as image key area detectors to detect areas of interest in each frame of pictures (such as face areas, human body areas including human limbs, including action areas for identifying fake actions, etc.), These regions of interest are taken as the key regions and identified in the picture stream X (i=1, 2, . . . , n).

3. Set the hyperparameter sampling interval d to obtain a picture sequence S (j=1,2,...,n/d), where n is a positive integer greater than 1, indicating the number of pictures, and the picture sequence S is a plurality of picture sequences collective name.

4. Take each picture sequence as an example, denote each picture sequence as a picture sequence Sj, and include at least two pictures before and after, that is, S1=(X1, X3), S2=(X3, X5), ..., S( n/d)=(Xn-2, Xn); for the selection of the sampling interval d, taking the first and last two pictures as an example, d=2 can be selected.

5. Calculate the pixel abnormality rate for the picture sequence according to formula (1):

Among them, Ratio is the pixel abnormality rate; Compare(Xi, Xi+d) is the similarity score, which can be used to compare and sample the video frames in the picture sequence to find the number of similar frames.

It should be pointed out that the index of similarity score is one of the indicators used to evaluate the pixel abnormality rate, the present disclosure is not limited to this indicator, and the indicators that can be used to evaluate the pixel abnormality rate are all within the protection scope of the present application.

6. Obtain multiple pixel abnormality rate sequences represented by R(k=1,2,...,n/d). Among them, n is a positive integer greater than 1, indicating the number of pictures, and d is the sampling interval.

7. For the evaluation of multiple pixel anomaly rate sequences, the video detection score can be obtained in various ways. This application example calculates the video detection score according to formula (2) or formula (3) as follows. The video detection score score∈(0,1); where score is 0, which represents the label of normal video. In the subsequent training process based on video detection score configuration threshold, if a picture is labeled with this type of label, it means that the video frame is not Edited (or unforged), it is the "true" original video frame; the score is 1, which represents the label of the abnormal video. In the subsequent training process based on the video detection score configuration threshold, if a picture is added If this tag is added, it means that the video frame has been edited (or forged) and is a "fake" forged video, that is, a tag based on a picture-driven video, and this type of tag has the possibility of attacking users.

score=strategy (R) (2)

Among them, in formula (2), R is a sequence of multiple pixel abnormality rates; the strategy can be configured according to the detection requirements of different application scenarios; score is the video detection score. The threshold can also be configured according to the score, for example, the score is used as the threshold, so that the location of the abnormal video can be directly located based on the policy.

Among them, in formula (3), R is a sequence of multiple pixel abnormality rates; b is a preset parameter; this parameter can be configured according to the detection requirements of different application scenarios; w is the weight; score is the video detection score, According to the distribution of multiple pixel abnormal rate sequences, the logistic regression principle of formula (3) can be used (in the logistic regression process, the method of making a training set and a verification set can be used for training, so as to obtain the w for practical application scene reasoning. parameters) to design a feature extractor to score the video more accurately, instead of scoring only through a preset strategy as in formula (2). It should be pointed out that the use of Logistic regression can be equivalent to the complex operation of manually designed convolution kernels and re-convolution on the video data stream. The threshold can also be configured according to the score, for example, the score is used as the threshold, so that the location of the abnormal video can be located based on the neural network obtained through training.

Using this application example, considering that if the video frame has been forged or edited, there will be "some pixels in different frames are completely the same". Therefore, it is possible to reasonably select the area to be detected and sort out the entire video. This kind of editing traces can be found out, and the "true" or "false" of the video frame can be judged, so as to quickly locate the abnormal video position in the entire video data stream. Easy to calculate, no visual processor (GPU, Graphics Processing Unit) required; low computational complexity and fast; high accuracy, strong interpretability, can directly locate abnormal video positions; can be directly inferred by setting thresholds, no training required , or training requires less data. Moreover, the size of the edited area in any video sequence is calculated by the above formula, and it is used as the basis for judging the video level. There is no need to design a complex convolutional neural network, and the threshold is configured based on the score obtained by the strategy or simple training. The least cost and the fastest operation can achieve a good detection effect.

According to an embodiment of the present disclosure, a video detection apparatus is provided. FIG. 3 is a schematic diagram of the composition and structure of the video detection apparatus according to an embodiment of the present disclosure. As shown in FIG. 3 , the video detection apparatus 300 includes: a target area detection module 301 , It is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that different video frames in the video data stream exist in the same part of the pixel area; abnormal video search The module 302 is used to find the abnormal video frame of the target area in the video data stream; the target video determination module 303 is used to determine the abnormal video frame when the detection parameter corresponding to the abnormal video frame meets the threshold. The video frame is determined as the target video frame.

According to an embodiment of the present disclosure, a video detection apparatus is provided. FIG. 4 is a schematic structural diagram of a video detection apparatus according to an embodiment of the present disclosure. As shown in FIG. 4 , the video detection apparatus 400 includes: a target area detection module 401 , It is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that different video frames in the video data stream have the same part of the same pixel area; wherein, the target area The region detection module 401 further includes: a key region extraction sub-module 4011, which is used for extracting key regions respectively from at least two adjacent video frames in the video data stream; The feature comparison of pixel points is performed on the key regions corresponding to the respective video frames, and the key regions obtained from the comparison have regions with the same part of the pixels as the target regions. And the abnormal video search module 402 is used to find the abnormal video frame of the target area in the video data stream; the target video determination module 403 is used for the detection parameter corresponding to the abnormal video frame in the case of meeting the threshold value, The abnormal video frame is determined as the target video frame.

In one embodiment, the target video determination module is configured to perform pixel anomaly operation on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score; according to the similarity score , obtain the pixel abnormality rate for the abnormal video frame; take the pixel abnormality rate as the detection parameter, and determine the abnormal video frame existing in the first video sequence when the detection parameter meets the threshold is the target video frame.

In one embodiment, the target video determination module is configured to perform pixel anomaly operation on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score; according to the similarity score , obtain the pixel abnormality rate for the abnormal video frame; select a second video sequence composed of different pixel abnormality rates from the first video sequence, and score them respectively to obtain the corresponding video detection score; The video detection score is used as the detection parameter, and when the detection parameter meets the threshold, an abnormal video frame existing in the second video sequence is determined as a target video frame.

In one embodiment, a threshold configuration module is further included, configured to configure the threshold according to a video detection score; wherein the video detection score includes: a second video sequence composed of different pixel abnormality rates in the first video sequence Score ratio; the first video sequence is the first video sequence composed of the abnormal video frames in the video data stream.

In an embodiment, a positioning module is further included, configured to locate the position of the target video frame in the video data stream according to the threshold.

For the functions of each module in each device in the embodiment of the present disclosure, reference may be made to the corresponding description in the foregoing method, and details are not described herein again.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

FIG. 5 is a block diagram of an electronic device used to implement the video detection method of an embodiment of the present disclosure. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in FIG. 5 , the electronic device 500 includes a computing unit 501 that can be executed according to a computer program stored in a read only memory (ROM) 502 or a computer program loaded from a storage unit 508 into a random access memory (RAM) 503 Various appropriate actions and handling. In the RAM 503, various programs and data required for the operation of the electronic device 500 can also be stored. The computing unit 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input output (I/O) interface 505 is also connected to bus 504 .

Various components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506, such as a keyboard, a mouse, etc.; an output unit 507, such as various types of displays, speakers, etc.; a storage unit 508, such as a magnetic disk, an optical disk etc.; and a communication unit 509, such as a network card, modem, wireless communication transceiver, and the like. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

Computing unit 501 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as video detection methods. For example, in some embodiments, the video detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508 . In some embodiments, part or all of the computer program may be loaded and/or installed on electronic device 500 via ROM 502 and/or communication unit 509 . When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the video detection method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the video detection method by any other suitable means (eg, by means of firmware).

Various implementations of the systems and techniques described herein above may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented. The program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input to receive input from the user.

The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present disclosure can be executed in parallel, sequentially, or in different orders. As long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, no limitation is imposed herein.

The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements, and improvements made within the spirit and principles of the present disclosure should be included within the protection scope of the present disclosure.

Claims

A video detection method, comprising:

Detecting the video frame in the video data stream to obtain a target area in the video frame, and the target area is used to represent that different video frames in the video data stream exist in some areas with completely identical pixels;

Find abnormal video frames in the target area in the video data stream;

In the case that the detection parameter corresponding to the abnormal video frame meets the threshold, the abnormal video frame is determined as the target video frame.
The method according to claim 1, wherein the detecting a video frame in a video data stream to obtain a target area in the video frame comprises:

Extracting key regions respectively for at least two adjacent video frames in the video data stream;

The feature comparison of the pixel points is performed on the key regions corresponding to the at least two video frames respectively, and the key regions obtained from the comparison have regions with the same part of the pixels as the target regions.
The method according to claim 1 or 2, wherein, when the detection parameter corresponding to the abnormal video frame meets a threshold, determining the abnormal video frame as the target video frame, comprising:

Perform pixel abnormal operation on the first video sequence formed by the abnormal video frame in the video data stream to obtain the similarity score;

According to the similarity score, obtain the pixel abnormality rate for the abnormal video frame;

The pixel abnormality rate is used as the detection parameter, and when the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame.
The method according to claim 1 or 2, wherein, when the detection parameter corresponding to the abnormal video frame meets a threshold, determining the abnormal video frame as the target video frame, comprising:

Perform pixel abnormal operation on the first video sequence formed by the abnormal video frame in the video data stream to obtain the similarity score;

According to the similarity score, obtain the pixel abnormality rate for the abnormal video frame;

Selecting a second video sequence consisting of different pixel anomaly rates from the first video sequence, and scoring respectively, to obtain a corresponding video detection score;

The video detection score is used as the detection parameter, and if the detection parameter meets the threshold, an abnormal video frame existing in the second video sequence is determined as a target video frame.
The method according to claim 1 or 2, further comprising:

Configure the threshold according to the video detection score;

Wherein, the video detection score includes: the score ratio of the second video sequence composed of different pixel abnormality rates in the first video sequence; the first video sequence is the video data stream composed of the abnormal video A first video sequence of frames.
The method of claim 5, further comprising:

According to the threshold, the position of the target video frame in the video data stream is located.
A video detection device, the device includes:

The target area detection module is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that there are some identical pixels in different video frames in the video data stream. area;

An abnormal video search module for finding abnormal video frames in the target area in the video data stream;

A target video determination module, configured to determine the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame meets a threshold.
The device according to claim 7, wherein the target area detection module is used for:

Extracting key regions respectively for at least two adjacent video frames in the video data stream;

The feature comparison of the pixel points is performed on the key regions corresponding to the at least two video frames respectively, and the key regions obtained from the comparison have regions with the same part of the pixels as the target regions.
The device according to claim 7 or 8, wherein the target video determination module is used for:

Perform pixel abnormal operation on the first video sequence formed by the abnormal video frame in the video data stream to obtain the similarity score;

According to the similarity score, obtain the pixel abnormality rate for the abnormal video frame;

The pixel abnormality rate is used as the detection parameter, and in the case that the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame.
The device according to claim 7 or 8, wherein the target video determination module is used for:

Perform pixel abnormal operation on the first video sequence formed by the abnormal video frame in the video data stream to obtain the similarity score;

According to the similarity score, obtain the pixel abnormality rate for the abnormal video frame;

Select a second video sequence composed of different pixel abnormality rates from the first video sequence, and score them respectively to obtain corresponding video detection scores;

The video detection score is used as the detection parameter, and if the detection parameter meets the threshold, an abnormal video frame existing in the second video sequence is determined as a target video frame.
The apparatus according to claim 7 or 8, further comprising a threshold configuration module for:

Configure the threshold according to the video detection score;

Wherein, the video detection score includes: the score ratio of the second video sequence composed of different pixel abnormality rates in the first video sequence; the first video sequence is the video data stream composed of the abnormal video A first video sequence of frames.
The apparatus of claim 11, further comprising a positioning module for:

According to the threshold, the position of the target video frame in the video data stream is located.
An electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-6 Methods.
A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.
A computer program product comprising computer instructions which, when executed by a processor, implement the method of any of claims 1-6.