WO2022188315A1 - Video detection method and apparatus, electronic device, and storage medium - Google Patents

Video detection method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2022188315A1
WO2022188315A1 PCT/CN2021/104572 CN2021104572W WO2022188315A1 WO 2022188315 A1 WO2022188315 A1 WO 2022188315A1 CN 2021104572 W CN2021104572 W CN 2021104572W WO 2022188315 A1 WO2022188315 A1 WO 2022188315A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
abnormal
video frame
data stream
detection
Prior art date
Application number
PCT/CN2021/104572
Other languages
French (fr)
Chinese (zh)
Inventor
熊俊峰
王洋
周越
张欢
仲震宇
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to JP2023519078A priority Critical patent/JP2023543015A/en
Priority to KR1020237009299A priority patent/KR20230045098A/en
Publication of WO2022188315A1 publication Critical patent/WO2022188315A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/95Pattern authentication; Markers therefor; Forgery detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Definitions

  • the present disclosure relates to the field of computer processing, and in particular, the present disclosure relates to the fields of artificial intelligence, deep learning, computer vision, image processing, face recognition, limb recognition, forgery detection, and the like.
  • forged pictures can be used to generate videos.
  • a piece of video content can be forged by replacing new elements (such as replacing other people's faces), so as to use various applications (such as bank customers) to achieve the purpose of forgery. terminal, access control system, etc.) to realize the attack, which will bring huge losses to users.
  • the present disclosure provides a video detection method, device, electronic device and storage medium.
  • a video detection method comprising:
  • the abnormal video frame is determined as the target video frame.
  • a video detection apparatus comprising:
  • the target area detection module is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that there are some identical pixels in different video frames in the video data stream. area;
  • An abnormal video search module for finding abnormal video frames in the target area in the video data stream
  • a target video determination module configured to determine the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame meets a threshold.
  • an electronic device comprising:
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method provided by any one of the embodiments of the present disclosure.
  • a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method provided by any one of the embodiments of the present disclosure.
  • a computer program product including computer instructions, which when executed by a processor implement the method provided by any one of the embodiments of the present disclosure.
  • a video frame in a video data stream can be detected, and a target area in the video frame can be obtained, and the target area is used to represent that different video frames in the video data stream have some areas with identical pixels; Find the abnormal video frame in the target area in the video data stream; in the case that the detection parameter corresponding to the abnormal video frame meets the threshold, determine the abnormal video frame as the target video frame, so that it can be detected Anomalous video frames that have been edited (or forged) in a video data stream.
  • FIG. 1 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a composition structure of a video detection apparatus according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a composition structure of a video detection apparatus according to an embodiment of the present disclosure
  • FIG. 5 is a block diagram of an electronic device used to implement the video detection method of an embodiment of the present disclosure.
  • first and second herein refer to and distinguish between a plurality of similar technical terms, and do not mean to limit the order, or to limit only two meanings, for example, the first feature and the second Feature means that there are two types/two features, the first feature can be one or more, and the second feature can also be one or more.
  • video can be collected based on mobile phone applications, access control, CCTV and other camera scenarios to obtain a video data stream.
  • the video frames in the video data stream are based on three primary colors ( RGB) images.
  • RGB three primary colors
  • the image can be combined with depth information (Deph) collected by structured light or binocular cameras, and multi-modal means combined with infrared and color illumination, it can perform functions such as face, limb, and motion security verification.
  • Weaknesses on RGB are also unavoidable, making it easy to fake videos through images.
  • the attack threat to the user is more obvious.
  • This kind of image-based driving video can achieve deceptive attacks through fake faces, fake human limbs, fake movements, etc. Due to the good attack effect and low attack cost, various applications involve related security verification functions. That said, the danger is enormous.
  • the classifiers trained by video understanding and video image convolutional network can be used to perform related tasks on fake faces, fake human limbs, fake actions, etc. in the video data stream.
  • the video image convolution network has a large number of parameters, and a large amount of data needs to be labeled before training, and the implementation cost is high;
  • the parameter adjustment operation to improve its network performance has low efficiency and relatively few applicable scenarios;
  • the machine running the video image convolution network is relatively sophisticated and expensive.
  • a target area can be reasonably selected, and the entire video data stream can be sorted out based on the target area, so as to find out which video frames in the video data stream have traces of being edited, so that the edited abnormality can be detected.
  • FIG. 1 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure.
  • the method can be applied to a video detection apparatus.
  • the apparatus can be deployed in a terminal or a server. or other processing devices, processing such as target area detection, abnormal video frame search, and target video frame determination after evaluating abnormal video frames can be implemented.
  • the terminal may be a user equipment (UE, User Equipment), a mobile device, a Personal Digital Assistant (PDA, Personal Digital Assistant), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like.
  • the method may also be implemented by the processor invoking computer-readable instructions stored in the memory. As shown in Figure 1, it includes:
  • the video frame in the video data stream is detected to find the target area, for example, the target area may be the area manually edited by the forger, and the area manually edited by the forger is:
  • Different video frames in the video data stream have some regions with completely identical pixels.
  • abnormal video frames in the target area can be found in the video data stream. It is also possible to perform operations on multiple abnormal video frames in the video data stream to obtain detection parameters corresponding to the abnormal video frames, for example, the abnormal pixel rate of abnormal pixels in the video frame sequence, or the abnormal video composed of different pixel abnormal rates.
  • the threshold may also be configured according to the video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold.
  • a video frame in a video data stream can be detected, and a target area in the video frame can be obtained, and the target area is used to represent that different video frames in the video data stream have some areas with identical pixels; Find an abnormal video frame in the target area in the video data stream; and determine the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame meets the threshold. Since "there are some areas with identical pixels in different video frames in the video data stream" is an artificial editing area (or a forged area), then this video frame is abnormal, and the entire video data stream needs to be detected, first find this Class target area (that is, the human editing area).
  • the entire video combing stream is combed through the target area, and the video frames that exist in the target area are regarded as abnormal video frames.
  • the abnormal video frame it is also necessary to finally determine the abnormal video frame as: the target video frame obtained through detection, when the detection parameter corresponding to the abnormal video frame is judged to meet the threshold value, thereby , which can accurately detect abnormal video frames that have been artificially edited (or forged) in the video data stream.
  • FIG. 2 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure. As shown in FIG. 2 , the method includes:
  • S202 Perform feature comparison of pixel points on the key regions corresponding to the at least two video frames respectively, and use the key regions obtained from the comparison as the target regions that have the same part of the pixels.
  • video frames in the video data stream are detected to find the target area.
  • driving videos based on images is to achieve deceptive attacks through fake faces, fake human limbs, fake movements, etc. Such as designated gestures, etc.) as the key area, instead of detecting all human bodies and performed actions.
  • the target area may be an area artificially edited by the forger, and the area artificially edited by the forger is: different video frames in the video data stream exist in some areas with identical pixels.
  • operations can also be performed on multiple abnormal video frames in the video data stream to obtain detection parameters corresponding to the abnormal video frames, for example, a video frame sequence
  • the threshold may also be configured according to the video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold.
  • determining the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame conforms to a threshold value includes: comparing the abnormal video frame in the video data stream by the abnormal video frame The formed first video sequence performs pixel abnormality operation to obtain the similarity score; according to the similarity score, obtains the pixel abnormality rate for the abnormal video frame; using the pixel abnormality rate as the detection parameter, in the If the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame.
  • the similarity score can be obtained by performing an operation on pixel anomalies for the first video sequence, and the similarity score can be used as an indicator for evaluating the pixel abnormality rate.
  • the pixel abnormality rate of the frame thus, the pixel abnormality rate is used as a detection parameter, and when the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame.
  • the abnormal video frame located in the first video sequence can be screened out from the entire video data stream according to the pixel abnormality rate, and finally the abnormal video frame located in the first video sequence can be determined as the target video frame, which improves the detection performance. accuracy.
  • determining the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame conforms to a threshold value includes: comparing the abnormal video frame in the video data stream by the abnormal video frame The constituted first video sequence is subjected to pixel anomaly operation to obtain a similarity score; according to the similarity score, a pixel abnormality rate for the abnormal video frame is obtained; constitute the second video sequence, and score them respectively to obtain the corresponding video detection score; take the video detection score as the detection parameter, and in the case that the detection parameter meets the threshold, the second video Abnormal video frames present in the sequence are determined as target video frames.
  • the similarity score can be obtained by performing an operation on pixel anomalies for the first video sequence, and the similarity score can be used as an indicator for evaluating the pixel abnormality rate.
  • the pixel anomaly rate of the frame Further, a second video sequence composed of different pixel abnormality rates can also be selected from the first video sequence, and scored respectively to obtain a corresponding video detection score, so as to use the video detection score as the said video detection score.
  • a detection parameter when the detection parameter conforms to a threshold, determine an abnormal video frame existing in the second video sequence as a target video frame.
  • abnormal video frames located in the first video sequence can be screened out from the entire video data stream according to the pixel abnormality rate, and a second video sequence composed of different pixel abnormality rates can be selected from the first video sequence, and respectively After scoring, after obtaining the corresponding video detection score, further according to the video detection score, screen out the abnormal video frame located in the second video sequence whose video detection score meets the expectations from the first video sequence, and finally the abnormal video frame located in the second video sequence.
  • Abnormal video frames in the second video sequence whose video detection scores meet expectations are determined as target video frames, which improves detection accuracy.
  • the method further includes: configuring the threshold according to the video detection score.
  • the video detection score includes: the score ratio of the second video sequence composed of different pixel abnormality rates in the first video sequence; the first video sequence is the video data stream composed of the abnormal video A first video sequence of frames.
  • an operation of pixel anomalies is performed on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score, and according to the similarity score, a target for the abnormal video frame is obtained.
  • the pixel abnormality rate of the first video sequence select a second video sequence composed of different pixel abnormality rates from the first video sequence, and score them respectively to obtain the corresponding video detection score.
  • the video detection score obtained by the operation configures the threshold.
  • the detection accuracy of abnormal video mainly depends on the proportion of abnormal video frames with different pixel abnormality rates in the whole video data stream, which is different from abnormal video frames.
  • the position of the video frame is independent, which improves the detection accuracy.
  • the method further includes: locating the position of the target video frame in the video data stream according to the threshold.
  • a threshold may also be configured according to the above-mentioned video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold.
  • a neural network for positioning can be simply trained according to the threshold, and the position of the target video frame can be located from the video data stream according to the neural network used for positioning. Since less data is required for training, the complexity of the neural network is reduced.
  • Ratio is the pixel abnormality rate
  • Compare(Xi, Xi+d) is the similarity score, which can be used to compare and sample the video frames in the picture sequence to find the number of similar frames.
  • the index of similarity score is one of the indicators used to evaluate the pixel abnormality rate
  • the present disclosure is not limited to this indicator, and the indicators that can be used to evaluate the pixel abnormality rate are all within the protection scope of the present application.
  • n is a positive integer greater than 1, indicating the number of pictures, and d is the sampling interval.
  • the video detection score can be obtained in various ways.
  • This application example calculates the video detection score according to formula (2) or formula (3) as follows.
  • the video detection score score ⁇ (0,1); where score is 0, which represents the label of normal video.
  • score is 0, which represents the label of normal video.
  • a picture is labeled with this type of label, it means that the video frame is not Edited (or unforged), it is the "true" original video frame; the score is 1, which represents the label of the abnormal video.
  • this tag In the subsequent training process based on the video detection score configuration threshold, if a picture is added, it means that the video frame has been edited (or forged) and is a "fake" forged video, that is, a tag based on a picture-driven video, and this type of tag has the possibility of attacking users.
  • R is a sequence of multiple pixel abnormality rates; the strategy can be configured according to the detection requirements of different application scenarios; score is the video detection score.
  • the threshold can also be configured according to the score, for example, the score is used as the threshold, so that the location of the abnormal video can be directly located based on the policy.
  • R is a sequence of multiple pixel abnormality rates
  • b is a preset parameter; this parameter can be configured according to the detection requirements of different application scenarios
  • w is the weight
  • score is the video detection score
  • the logistic regression principle of formula (3) can be used (in the logistic regression process, the method of making a training set and a verification set can be used for training, so as to obtain the w for practical application scene reasoning. parameters) to design a feature extractor to score the video more accurately, instead of scoring only through a preset strategy as in formula (2).
  • the threshold can also be configured according to the score, for example, the score is used as the threshold, so that the location of the abnormal video can be located based on the neural network obtained through training.
  • the size of the edited area in any video sequence is calculated by the above formula, and it is used as the basis for judging the video level.
  • the threshold is configured based on the score obtained by the strategy or simple training. The least cost and the fastest operation can achieve a good detection effect.
  • FIG. 3 is a schematic diagram of the composition and structure of the video detection apparatus according to an embodiment of the present disclosure.
  • the video detection apparatus 300 includes: a target area detection module 301 , It is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that different video frames in the video data stream exist in the same part of the pixel area; abnormal video search The module 302 is used to find the abnormal video frame of the target area in the video data stream; the target video determination module 303 is used to determine the abnormal video frame when the detection parameter corresponding to the abnormal video frame meets the threshold. The video frame is determined as the target video frame.
  • FIG. 4 is a schematic structural diagram of a video detection apparatus according to an embodiment of the present disclosure.
  • the video detection apparatus 400 includes: a target area detection module 401 , It is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that different video frames in the video data stream have the same part of the same pixel area; wherein, the target area
  • the region detection module 401 further includes: a key region extraction sub-module 4011, which is used for extracting key regions respectively from at least two adjacent video frames in the video data stream; The feature comparison of pixel points is performed on the key regions corresponding to the respective video frames, and the key regions obtained from the comparison have regions with the same part of the pixels as the target regions.
  • abnormal video search module 402 is used to find the abnormal video frame of the target area in the video data stream;
  • target video determination module 403 is used for the detection parameter corresponding to the abnormal video frame in the case of meeting the threshold value, The abnormal video frame is determined as the target video frame.
  • the target video determination module is configured to perform pixel anomaly operation on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score; according to the similarity score , obtain the pixel abnormality rate for the abnormal video frame; take the pixel abnormality rate as the detection parameter, and determine the abnormal video frame existing in the first video sequence when the detection parameter meets the threshold is the target video frame.
  • the target video determination module is configured to perform pixel anomaly operation on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score; according to the similarity score , obtain the pixel abnormality rate for the abnormal video frame; select a second video sequence composed of different pixel abnormality rates from the first video sequence, and score them respectively to obtain the corresponding video detection score;
  • the video detection score is used as the detection parameter, and when the detection parameter meets the threshold, an abnormal video frame existing in the second video sequence is determined as a target video frame.
  • a threshold configuration module is further included, configured to configure the threshold according to a video detection score; wherein the video detection score includes: a second video sequence composed of different pixel abnormality rates in the first video sequence Score ratio; the first video sequence is the first video sequence composed of the abnormal video frames in the video data stream.
  • a positioning module is further included, configured to locate the position of the target video frame in the video data stream according to the threshold.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 5 is a block diagram of an electronic device used to implement the video detection method of an embodiment of the present disclosure.
  • the electronic device may be the aforementioned deployment device or proxy device.
  • Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device 500 includes a computing unit 501 that can be executed according to a computer program stored in a read only memory (ROM) 502 or a computer program loaded from a storage unit 508 into a random access memory (RAM) 503 Various appropriate actions and handling. In the RAM 503, various programs and data required for the operation of the electronic device 500 can also be stored.
  • the computing unit 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input output (I/O) interface 505 is also connected to bus 504 .
  • Various components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506, such as a keyboard, a mouse, etc.; an output unit 507, such as various types of displays, speakers, etc.; a storage unit 508, such as a magnetic disk, an optical disk etc.; and a communication unit 509, such as a network card, modem, wireless communication transceiver, and the like.
  • the communication unit 509 allows the electronic device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • Computing unit 501 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the computing unit 501 performs the various methods and processes described above, such as video detection methods.
  • the video detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508 .
  • part or all of the computer program may be loaded and/or installed on electronic device 500 via ROM 502 and/or communication unit 509 .
  • ROM 502 and/or communication unit 509 When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the video detection method described above may be performed.
  • the computing unit 501 may be configured to perform the video detection method by any other suitable means (eg, by means of firmware).
  • Various implementations of the systems and techniques described herein above may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC systems on chips system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • a computer system can include clients and servers.
  • Clients and servers are generally remote from each other and usually interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

A video detection method and apparatus, an electronic device, and a storage medium, relating to fields such as artificial intelligence, deep learning, computer vision, image processing, facial recognition, limb recognition, and counterfeit detection. The method comprises: detecting a video frame in a video data stream to obtain a target region in the video frame, the target region being used for representing a region where different video frames in the video data stream have some identical pixels (S101); searching the video data stream for an abnormal video frame that has the target region (S102); and if a detection parameter corresponding to the abnormal video frame meets a threshold, determining the abnormal video frame as a target video frame (S103). The method can detect an abnormal video frame, in a video data stream, that has been edited (or referred to as counterfeited) by humans.

Description

一种视频检测方法、装置、电子设备及存储介质A video detection method, device, electronic device and storage medium
本申请要求于2021年03月12日提交中国专利局、申请号为202110272132.X、发明名称为“视频检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on March 12, 2021 with the application number 202110272132.X and the invention titled "video detection method, device, electronic device and storage medium", the entire content of which is approved by Reference is incorporated in this application.
技术领域technical field
本公开涉及计算机处理领域,本公开尤其涉及人工智能、深度学习、计算机视觉、图像处理、人脸识别、肢体识别、伪造检测等领域。The present disclosure relates to the field of computer processing, and in particular, the present disclosure relates to the fields of artificial intelligence, deep learning, computer vision, image processing, face recognition, limb recognition, forgery detection, and the like.
背景技术Background technique
随着计算机技术的发展,不仅图片及音频可以被伪造,甚至视频也可以被伪造。在视频的深度伪造处理中,可以用伪造过的图片来生成视频,比如,对一段视频内容通过更换新元素(比如更换别人的脸)来达到伪造的目的,以利用各类应用(如银行客户端、门禁系统等)的安全漏洞,来实现攻击,从而给用户带来巨大的损失。With the development of computer technology, not only pictures and audios can be forged, but even videos can be forged. In the deep forgery processing of videos, forged pictures can be used to generate videos. For example, a piece of video content can be forged by replacing new elements (such as replacing other people's faces), so as to use various applications (such as bank customers) to achieve the purpose of forgery. terminal, access control system, etc.) to realize the attack, which will bring huge losses to users.
发明内容SUMMARY OF THE INVENTION
本公开提供了一种视频检测方法、装置、电子设备及存储介质。The present disclosure provides a video detection method, device, electronic device and storage medium.
根据本公开的一方面,提供了一种视频检测方法,包括:According to an aspect of the present disclosure, a video detection method is provided, comprising:
对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中不同视频帧存在部分像素完全相同的区域;Detecting the video frame in the video data stream to obtain a target area in the video frame, and the target area is used to represent that different video frames in the video data stream exist in some areas with completely identical pixels;
查找所述视频数据流中存在所述目标区域的异常视频帧;Find abnormal video frames in the target area in the video data stream;
在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。In the case that the detection parameter corresponding to the abnormal video frame meets the threshold, the abnormal video frame is determined as the target video frame.
根据本公开的另一方面,提供了一种视频检测装置,包括:According to another aspect of the present disclosure, a video detection apparatus is provided, comprising:
目标区域检测模块,用于对视频数据流中的视频帧进行检测,得到所 述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中不同视频帧存在部分像素完全相同的区域;The target area detection module is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that there are some identical pixels in different video frames in the video data stream. area;
异常视频查找模块,用于查找所述视频数据流中存在所述目标区域的异常视频帧;An abnormal video search module for finding abnormal video frames in the target area in the video data stream;
目标视频确定模块,用于在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。A target video determination module, configured to determine the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame meets a threshold.
根据本公开的另一方面,提供了一种电子设备,包括:According to another aspect of the present disclosure, there is provided an electronic device, comprising:
至少一个处理器;以及at least one processor; and
与该至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
该存储器存储有可被该至少一个处理器执行的指令,该指令被该至少一个处理器执行,以使该至少一个处理器能够执行本公开任意一实施例所提供的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method provided by any one of the embodiments of the present disclosure.
根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,该计算机指令用于使该计算机执行本公开任意一项实施例所提供的方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method provided by any one of the embodiments of the present disclosure.
根据本公开的另一方面,提供了一种计算机程序产品,包括计算机指令,该计算机指令被处理器执行时实现本公开任意一项实施例所提供的方法。According to another aspect of the present disclosure, a computer program product is provided, including computer instructions, which when executed by a processor implement the method provided by any one of the embodiments of the present disclosure.
采用本公开,可以对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中不同视频帧存在部分像素完全相同的区域;查找所述视频数据流中存在所述目标区域的异常视频帧;在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧,从而,可以检测出视频数据流中被人为编辑过(或称被伪造过)的异常视频帧。By adopting the present disclosure, a video frame in a video data stream can be detected, and a target area in the video frame can be obtained, and the target area is used to represent that different video frames in the video data stream have some areas with identical pixels; Find the abnormal video frame in the target area in the video data stream; in the case that the detection parameter corresponding to the abnormal video frame meets the threshold, determine the abnormal video frame as the target video frame, so that it can be detected Anomalous video frames that have been edited (or forged) in a video data stream.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present disclosure. in:
图1是根据本公开实施例的视频检测方法的流程示意图;1 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure;
图2是根据本公开实施例的视频检测方法的流程示意图;2 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure;
图3是根据本公开实施例的视频检测装置的组成结构示意图;3 is a schematic diagram of a composition structure of a video detection apparatus according to an embodiment of the present disclosure;
图4是根据本公开实施例的视频检测装置的组成结构示意图;4 is a schematic diagram of a composition structure of a video detection apparatus according to an embodiment of the present disclosure;
图5是用来实现本公开实施例的视频检测方法的电子设备的框图。FIG. 5 is a block diagram of an electronic device used to implement the video detection method of an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。本文中术语“第一”、“第二”表示指代多个类似的技术用语并对其进行区分,并不是限定顺序的意思,或者限定只有两个的意思,例如,第一特征和第二特征,是指代有两类/两个特征,第一特征可以为一个或多个,第二特征也可以为一个或多个。The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. The term "at least one" herein refers to any combination of any one of a plurality or at least two of a plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and Any one or more elements selected from the set of C. The terms "first" and "second" herein refer to and distinguish between a plurality of similar technical terms, and do not mean to limit the order, or to limit only two meanings, for example, the first feature and the second Feature means that there are two types/two features, the first feature can be one or more, and the second feature can also be one or more.
另外,为了更好的说明本公开,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present disclosure may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.
针对视频的深度伪造而言,以门禁系统的应用场景为例,可以基于手机应用、门禁,闭路电视等摄像头场景下采集视频,得到视频数据流,该视频数据流中的视频帧为基于三原色(RGB)的图像。即便该图像可以结 合包括结构光或者双目摄像头所采集的深度信息(Deph),以及结合红外和颜色照射等多模态的手段,在执行诸如人脸、肢体、动作的安全验证等功能,在RGB上的弱点也是无法规避的,方便了通过图像来伪造视频。尤其是在系统入侵时同时植入所伪造的视频,对用户的攻击威胁更为明显。这种基于图像来驱动视频,从而达到通过伪造的人脸、伪造的人物肢体、伪造的动作等实现欺骗性的攻击,由于攻击效果好,攻击成本低,对各类应用涉及相关安全验证功能来说,危害是巨大的。For deep forgery of video, taking the application scenario of access control system as an example, video can be collected based on mobile phone applications, access control, CCTV and other camera scenarios to obtain a video data stream. The video frames in the video data stream are based on three primary colors ( RGB) images. Even if the image can be combined with depth information (Deph) collected by structured light or binocular cameras, and multi-modal means combined with infrared and color illumination, it can perform functions such as face, limb, and motion security verification. Weaknesses on RGB are also unavoidable, making it easy to fake videos through images. Especially when the forged video is implanted at the same time when the system is intruded, the attack threat to the user is more obvious. This kind of image-based driving video can achieve deceptive attacks through fake faces, fake human limbs, fake movements, etc. Due to the good attack effect and low attack cost, various applications involve related security verification functions. That said, the danger is enormous.
有鉴于此,基于图像来驱动视频的方案中,可以通过视频理解、视频图像卷积网络训练的分类器,对视频数据流中伪造的人脸、伪造的人物肢体、伪造的动作等执行相关的检测,但是,该视频图像卷积网络有大量参数,在训练之前需要标注大量的数据,实现的成本高;该视频图像卷积网络通常容易过拟合(即不具备通用性),需要大量的调参操作,以改善其网络性能,效率低且适用场景相对少;运行该视频图像卷积网络的机器比较精密,造价比较昂贵。In view of this, in the scheme of driving video based on images, the classifiers trained by video understanding and video image convolutional network can be used to perform related tasks on fake faces, fake human limbs, fake actions, etc. in the video data stream. However, the video image convolution network has a large number of parameters, and a large amount of data needs to be labeled before training, and the implementation cost is high; The parameter adjustment operation to improve its network performance has low efficiency and relatively few applicable scenarios; the machine running the video image convolution network is relatively sophisticated and expensive.
本公开中,可以通过合理的选择目标区域,基于该目标区域对整个视频数据流进行梳理,从而可以找出该视频数据流中哪些视频帧存在被编辑的痕迹,从而对这种被编辑的异常视频做出正确的判断。考虑到基于图片驱动的视频,是根据关键点驱动部分位置,不同视频帧画面会存在部分像素完全相同的情况,即使视频数据流存在一定随机噪声,但是正常视频(非编辑处理过的伪造视频)下像素同一的可能几乎等于零,则可以认为这种现象属于视频帧存在被该编辑的痕迹。In the present disclosure, a target area can be reasonably selected, and the entire video data stream can be sorted out based on the target area, so as to find out which video frames in the video data stream have traces of being edited, so that the edited abnormality can be detected. Video to make the right judgment. Considering that the video driven by pictures is based on the key points to drive part of the position, there will be some pixels that are exactly the same in different video frames, even if there is a certain random noise in the video data stream, but the normal video (non-edited fake video) The possibility that the next pixel is the same may be almost equal to zero, and it can be considered that this phenomenon belongs to the traces of the editing of the video frame.
根据本公开的实施例,提供了一种视频检测方法,图1是根据本公开实施例的视频检测方法的流程示意图,该方法可以应用于视频检测装置,例如,该装置可以部署于终端或服务器或其它处理设备执行的情况下,可以实现目标区域检测、异常视频帧查找、以及对异常视频帧评估后确定出目标视频帧等等处理。其中,终端可以为用户设备(UE,User Equipment)、移动设备、个人数字处理(PDA,Personal Digital Assistant)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该方法还可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。如 图1所示,包括:According to an embodiment of the present disclosure, a video detection method is provided. FIG. 1 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure. The method can be applied to a video detection apparatus. For example, the apparatus can be deployed in a terminal or a server. or other processing devices, processing such as target area detection, abnormal video frame search, and target video frame determination after evaluating abnormal video frames can be implemented. The terminal may be a user equipment (UE, User Equipment), a mobile device, a Personal Digital Assistant (PDA, Personal Digital Assistant), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like. In some possible implementations, the method may also be implemented by the processor invoking computer-readable instructions stored in the memory. As shown in Figure 1, it includes:
S101、对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中不同视频帧存在部分像素完全相同的区域。S101. Detecting a video frame in a video data stream to obtain a target area in the video frame, where the target area is used to indicate that there are some areas with identical pixels in different video frames in the video data stream.
S102、查找所述视频数据流中存在所述目标区域的异常视频帧。S102. Search for abnormal video frames in the target area in the video data stream.
S103、在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。S103. In the case that the detection parameter corresponding to the abnormal video frame meets the threshold, determine the abnormal video frame as a target video frame.
S101-S103的一示例中,对视频数据流中的视频帧进行检测,以查找该目标区域,比如,该目标区域可以为被伪造者人为编辑的区域,该被伪造者人为编辑的区域为:所述视频数据流中不同视频帧存在部分像素完全相同的区域。基于该目标区域对整个视频数据流进行梳理的过程中,可以查找到该视频数据流中存在所述目标区域的异常视频帧。还可以对该视频数据流中针对多个异常视频帧进行运算,以得到异常视频帧对应的检测参数,比如,视频帧序列像素异常的像素异常率,或进一步对不同像素异常率构成的异常视频序列打分所得到的视频检测分值,从而在检测参数符合阈值的情况下将所述异常视频帧确定为目标视频帧。针对该阈值而言,还可以根据该视频检测分值配置该阈值,以根据该阈值更精确的定位出该视频数据流中该目标视频帧的具体位置。In an example of S101-S103, the video frame in the video data stream is detected to find the target area, for example, the target area may be the area manually edited by the forger, and the area manually edited by the forger is: Different video frames in the video data stream have some regions with completely identical pixels. During the process of sorting out the entire video data stream based on the target area, abnormal video frames in the target area can be found in the video data stream. It is also possible to perform operations on multiple abnormal video frames in the video data stream to obtain detection parameters corresponding to the abnormal video frames, for example, the abnormal pixel rate of abnormal pixels in the video frame sequence, or the abnormal video composed of different pixel abnormal rates. The video detection score obtained by scoring the sequence, so that the abnormal video frame is determined as the target video frame under the condition that the detection parameter meets the threshold. For the threshold, the threshold may also be configured according to the video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold.
采用本公开,可以对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中不同视频帧存在部分像素完全相同的区域;查找所述视频数据流中存在所述目标区域的异常视频帧;在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。由于“所述视频数据流中不同视频帧存在部分像素完全相同的区域”为人为编辑区域(或称伪造区域),则这个视频帧是异常的,需要对整个视频数据流进行检测,先找到这类目标区域(即该人为编辑区域)。之后,通过该目标区域梳理整个视频梳理流,将存在该目标区域的视频帧作为异常视频帧。考虑异常视频帧的检测准确性,还需要在判断出该异常视频帧对应的检测参数符合阈值的情况下,将该异常视频帧最终确定为:通过检测所筛选得出的该目标视频帧,从而,可以准确 的检测出视频数据流中被人为编辑过(或称被伪造过)的异常视频帧。By adopting the present disclosure, a video frame in a video data stream can be detected, and a target area in the video frame can be obtained, and the target area is used to represent that different video frames in the video data stream have some areas with identical pixels; Find an abnormal video frame in the target area in the video data stream; and determine the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame meets the threshold. Since "there are some areas with identical pixels in different video frames in the video data stream" is an artificial editing area (or a forged area), then this video frame is abnormal, and the entire video data stream needs to be detected, first find this Class target area (that is, the human editing area). After that, the entire video combing stream is combed through the target area, and the video frames that exist in the target area are regarded as abnormal video frames. Considering the detection accuracy of the abnormal video frame, it is also necessary to finally determine the abnormal video frame as: the target video frame obtained through detection, when the detection parameter corresponding to the abnormal video frame is judged to meet the threshold value, thereby , which can accurately detect abnormal video frames that have been artificially edited (or forged) in the video data stream.
根据本公开的实施例,提供了一种视频检测方法,图2是根据本公开实施例的视频检测方法的流程示意图,如图2所示,包括:According to an embodiment of the present disclosure, a video detection method is provided. FIG. 2 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure. As shown in FIG. 2 , the method includes:
S201、对所述视频数据流中相邻的至少两个视频帧,分别提取关键区域。S201. Extract key regions respectively for at least two adjacent video frames in the video data stream.
S202、对所述至少两个视频帧分别对应的关键区域进行像素点的特征比对,将比对得到的所述关键区域中存在所述部分像素完全相同的区域,作为所述目标区域。S202. Perform feature comparison of pixel points on the key regions corresponding to the at least two video frames respectively, and use the key regions obtained from the comparison as the target regions that have the same part of the pixels.
S203、查找所述视频数据流中存在所述目标区域的异常视频帧。S203. Search for abnormal video frames in the target area in the video data stream.
S204、在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。S204. In the case that the detection parameter corresponding to the abnormal video frame meets the threshold, determine the abnormal video frame as a target video frame.
S201-S204的一示例中,对视频数据流中的视频帧进行检测,以查找该目标区域。考虑到基于图像来驱动视频,是通过伪造的人脸、伪造的人物肢体、伪造的动作等实现欺骗性的攻击,因此,根据应用场景的需求,可以将人脸、人物肢体、伪造的动作(如指定的手势等)等作为该关键区域,而不是对所有的人体及所执行的动作进行检测。其中,该目标区域可以为被伪造者人为编辑的区域,该被伪造者人为编辑的区域为:所述视频数据流中不同视频帧存在部分像素完全相同的区域。查找到该视频数据流中存在所述目标区域的异常视频帧后,还可以对该视频数据流中针对多个异常视频帧进行运算,以得到异常视频帧对应的检测参数,比如,视频帧序列像素异常的像素异常率,或进一步对不同像素异常率构成的异常视频序列打分所得到的视频检测分值,从而在检测参数符合阈值的情况下将所述异常视频帧确定为目标视频帧。针对该阈值而言,还可以根据该视频检测分值配置该阈值,以根据该阈值更精确的定位出该视频数据流中该目标视频帧的具体位置。采用本实施方式,由于针对该关键区域进行检测,因此,不仅提高了处理速度,且检测准确性更高。In an example of S201-S204, video frames in the video data stream are detected to find the target area. Considering that driving videos based on images is to achieve deceptive attacks through fake faces, fake human limbs, fake movements, etc. Such as designated gestures, etc.) as the key area, instead of detecting all human bodies and performed actions. Wherein, the target area may be an area artificially edited by the forger, and the area artificially edited by the forger is: different video frames in the video data stream exist in some areas with identical pixels. After finding the abnormal video frames in the target area in the video data stream, operations can also be performed on multiple abnormal video frames in the video data stream to obtain detection parameters corresponding to the abnormal video frames, for example, a video frame sequence The pixel abnormality rate of abnormal pixels, or the video detection score obtained by further scoring abnormal video sequences composed of different pixel abnormality rates, so that the abnormal video frame is determined as the target video frame when the detection parameters meet the threshold. For the threshold, the threshold may also be configured according to the video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold. With this embodiment, since the detection is performed on the key area, not only the processing speed is improved, but also the detection accuracy is higher.
一实施方式中,所述在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧,包括:对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分 值;根据所述雷同分值,得到针对所述异常视频帧的像素异常率;将所述像素异常率作为所述检测参数,在所述检测参数符合阈值的情况下,将所述第一视频序列中存在的异常视频帧确定为目标视频帧。采用本实施方式,可以通过针对第一视频序列进行像素异常的运算,得到雷同分值,雷同分值作为用于评估像素异常率的一个指标,可以根据该雷同分值,得到针对所述异常视频帧的像素异常率,从而,将像素异常率作为检测参数,在所述检测参数符合阈值的情况下,将所述第一视频序列中存在的异常视频帧确定为目标视频帧。换言之,可以根据像素异常率,从整个视频数据流中筛选出位于第一视频序列中的异常视频帧,并最终将该位于第一视频序列中的异常视频帧确定为目标视频帧,提高了检测准确性。In one embodiment, determining the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame conforms to a threshold value includes: comparing the abnormal video frame in the video data stream by the abnormal video frame The formed first video sequence performs pixel abnormality operation to obtain the similarity score; according to the similarity score, obtains the pixel abnormality rate for the abnormal video frame; using the pixel abnormality rate as the detection parameter, in the If the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame. Using this embodiment, the similarity score can be obtained by performing an operation on pixel anomalies for the first video sequence, and the similarity score can be used as an indicator for evaluating the pixel abnormality rate. The pixel abnormality rate of the frame, thus, the pixel abnormality rate is used as a detection parameter, and when the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame. In other words, the abnormal video frame located in the first video sequence can be screened out from the entire video data stream according to the pixel abnormality rate, and finally the abnormal video frame located in the first video sequence can be determined as the target video frame, which improves the detection performance. accuracy.
一实施方式中,所述在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧,包括:对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分值;根据所述雷同分值,得到针对所述异常视频帧的像素异常率;从所述第一视频序列中选取由不同像素异常率构成的第二视频序列,并分别进行打分,得到对应的视频检测分值;将所述视频检测分值作为所述检测参数,在所述检测参数符合阈值的情况下,将所述第二视频序列中存在的异常视频帧确定为目标视频帧。采用本实施方式,可以通过针对第一视频序列进行像素异常的运算,得到雷同分值,雷同分值作为用于评估像素异常率的一个指标,可以根据该雷同分值,得到针对所述异常视频帧的像素异常率。进一步的,还可以从所述第一视频序列中选取由不同像素异常率构成的第二视频序列,并分别进行打分,得到对应的视频检测分值,以将所述视频检测分值作为所述检测参数,在所述检测参数符合阈值的情况下,将所述第二视频序列中存在的异常视频帧确定为目标视频帧。换言之,可以根据像素异常率,从整个视频数据流中筛选出位于第一视频序列中的异常视频帧,从所述第一视频序列中选取由不同像素异常率构成的第二视频序列,并分别进行打分,得到对应的视频检测分值后,进一步根据该视频检测分值,从第一视频序列中筛选出位于第二视频序列中视频检测分值符合预期的异常视频帧,并最终将该位于第二视频序列中视频检测分值符合预期的 异常视频帧确定为目标视频帧,提高了检测准确性。In one embodiment, determining the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame conforms to a threshold value includes: comparing the abnormal video frame in the video data stream by the abnormal video frame The constituted first video sequence is subjected to pixel anomaly operation to obtain a similarity score; according to the similarity score, a pixel abnormality rate for the abnormal video frame is obtained; constitute the second video sequence, and score them respectively to obtain the corresponding video detection score; take the video detection score as the detection parameter, and in the case that the detection parameter meets the threshold, the second video Abnormal video frames present in the sequence are determined as target video frames. Using this embodiment, the similarity score can be obtained by performing an operation on pixel anomalies for the first video sequence, and the similarity score can be used as an indicator for evaluating the pixel abnormality rate. The pixel anomaly rate of the frame. Further, a second video sequence composed of different pixel abnormality rates can also be selected from the first video sequence, and scored respectively to obtain a corresponding video detection score, so as to use the video detection score as the said video detection score. A detection parameter, when the detection parameter conforms to a threshold, determine an abnormal video frame existing in the second video sequence as a target video frame. In other words, abnormal video frames located in the first video sequence can be screened out from the entire video data stream according to the pixel abnormality rate, and a second video sequence composed of different pixel abnormality rates can be selected from the first video sequence, and respectively After scoring, after obtaining the corresponding video detection score, further according to the video detection score, screen out the abnormal video frame located in the second video sequence whose video detection score meets the expectations from the first video sequence, and finally the abnormal video frame located in the second video sequence. Abnormal video frames in the second video sequence whose video detection scores meet expectations are determined as target video frames, which improves detection accuracy.
一实施方式中,还包括:根据视频检测分值配置所述阈值。其中,所述视频检测分值包括:不同像素异常率构成的第二视频序列在第一视频序列中的分值占比;所述第一视频序列为所述视频数据流中由所述异常视频帧构成的第一视频序列。配置阈值的一示例中,对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分值,根据该雷同分值,得到针对所述异常视频帧的像素异常率,从所述第一视频序列中选取由不同像素异常率构成的第二视频序列,并分别进行打分,得到对应的视频检测分值,从而,可以根据该针对所述异常视频帧运算得到的该视频检测分值配置该阈值。采用本实施方式,根据该视频检测分值进行评估的过程中,异常视频的检测准确性主要取决于经打分得到不同像素异常率的异常视频帧在整个视频数据流中的占比,而与异常视频帧的位置无关,从而提高了检测准确性。In an embodiment, the method further includes: configuring the threshold according to the video detection score. Wherein, the video detection score includes: the score ratio of the second video sequence composed of different pixel abnormality rates in the first video sequence; the first video sequence is the video data stream composed of the abnormal video A first video sequence of frames. In an example of configuring a threshold value, an operation of pixel anomalies is performed on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score, and according to the similarity score, a target for the abnormal video frame is obtained. the pixel abnormality rate of the first video sequence, select a second video sequence composed of different pixel abnormality rates from the first video sequence, and score them respectively to obtain the corresponding video detection score. The video detection score obtained by the operation configures the threshold. With this embodiment, in the process of evaluating according to the video detection score, the detection accuracy of abnormal video mainly depends on the proportion of abnormal video frames with different pixel abnormality rates in the whole video data stream, which is different from abnormal video frames. The position of the video frame is independent, which improves the detection accuracy.
一实施方式中,还包括:根据所述阈值,定位出所述目标视频帧在所述视频数据流中的位置。针对该阈值而言,还可以根据上述视频检测分值配置阈值,以根据该阈值更精确的定位出该视频数据流中该目标视频帧的具体位置。采用本实施方式,由于该阈值是通过上述视频检测分值配置得到的,因此,该阈值本身就可以用于评估存在异常的目标视频帧,从而,可以根据阈值直接推理出该目标视频帧的位置,不需要涉及神经网络的训练。除此之外,为了更加精确及提高定位效率,还可以根据阈值简单训练得到用于定位的神经网络,根据该用于定位的神经网络该目标视频帧的位置从该视频数据流中定位出来,由于训练所需要的数据少,从而降低了神经网络的复杂度。In an embodiment, the method further includes: locating the position of the target video frame in the video data stream according to the threshold. For the threshold, a threshold may also be configured according to the above-mentioned video detection score, so as to more accurately locate the specific position of the target video frame in the video data stream according to the threshold. With this embodiment, since the threshold is obtained through the above-mentioned video detection score configuration, the threshold itself can be used to evaluate the abnormal target video frame, so that the position of the target video frame can be directly inferred according to the threshold , which does not involve the training of neural networks. In addition, in order to be more accurate and improve the positioning efficiency, a neural network for positioning can be simply trained according to the threshold, and the position of the target video frame can be located from the video data stream according to the neural network used for positioning. Since less data is required for training, the complexity of the neural network is reduced.
应用示例:Application example:
应用本公开实施例一处理流程包括如下内容:The processing flow of applying the first embodiment of the present disclosure includes the following contents:
1、读取视频数据流,得到多个视频帧,每个视频帧为一帧图片,多个视频帧构成了图片流X(i=1,2,…,n),n为大于1的正整数,表示图片的个数。1. Read the video data stream to obtain multiple video frames, each video frame is a frame of pictures, and multiple video frames constitute the picture stream X (i=1,2,...,n), where n is a positive value greater than 1. Integer, indicating the number of pictures.
2、检测关键区域,可以使用图片关键区域检测器等传感器检测每一帧图片中的感兴趣区域(如人脸区域、包括人物肢体的人体区域、包括用于 识别伪造动作的动作区域等),将这些感兴趣的区域作为该关键区域,并在图片流X(i=1,2,…,n)中予以标识。2. Detect key areas, you can use sensors such as image key area detectors to detect areas of interest in each frame of pictures (such as face areas, human body areas including human limbs, including action areas for identifying fake actions, etc.), These regions of interest are taken as the key regions and identified in the picture stream X (i=1, 2, . . . , n).
3、设定超参数采样间隔d,得到图片序列S(j=1,2,…,n/d),n为大于1的正整数,表示图片的个数,图片序列S为多个图片序列的统称。3. Set the hyperparameter sampling interval d to obtain a picture sequence S (j=1,2,...,n/d), where n is a positive integer greater than 1, indicating the number of pictures, and the picture sequence S is a plurality of picture sequences collective name.
4、以每个图片序列为例,将每个图片序列记为图片序列Sj,且至少包含前后两张图片,即S1=(X1,X3)、S2=(X3,X5)、…、S(n/d)=(Xn-2,Xn);对于采样间隔d的选取,以前后两张图片为例,可以选取d=2。4. Take each picture sequence as an example, denote each picture sequence as a picture sequence Sj, and include at least two pictures before and after, that is, S1=(X1, X3), S2=(X3, X5), ..., S( n/d)=(Xn-2, Xn); for the selection of the sampling interval d, taking the first and last two pictures as an example, d=2 can be selected.
5、根据公式(1)计算针对图片序列的像素异常率:5. Calculate the pixel abnormality rate for the picture sequence according to formula (1):
Figure PCTCN2021104572-appb-000001
Figure PCTCN2021104572-appb-000001
其中,Ratio为像素异常率;Compare(Xi,Xi+d)为雷同分值,可以通过该雷同分值对图片序列中的视频帧进行比较采样,以发现雷同帧数。Among them, Ratio is the pixel abnormality rate; Compare(Xi, Xi+d) is the similarity score, which can be used to compare and sample the video frames in the picture sequence to find the number of similar frames.
需要指出的是,雷同分值这一指标,是用于评估像素异常率的其中一个指标,本公开不限于该指标,可以用于评估像素异常率的指标都在本申请的保护范围之内。It should be pointed out that the index of similarity score is one of the indicators used to evaluate the pixel abnormality rate, the present disclosure is not limited to this indicator, and the indicators that can be used to evaluate the pixel abnormality rate are all within the protection scope of the present application.
6、得到以R(k=1,2,…,n/d)表示的多个像素异常率序列。其中,n为大于1的正整数,表示图片的个数,d为采样间隔。6. Obtain multiple pixel abnormality rate sequences represented by R(k=1,2,...,n/d). Among them, n is a positive integer greater than 1, indicating the number of pictures, and d is the sampling interval.
7、针对多个像素异常率序列的评估,可以采用多种方式得出视频检测分值,本应用示例根据如下根据公式(2)或公式(3)来计算视频检测分值,视频检测分值score∈(0,1);其中,score为0,代表正常视频的标签,在后续基于视频检测分值配置阈值的训练过程中,若某个图片加上该类标签,则说明该视频帧未被编辑过(或称未被伪造过),是“真”的原始视频帧;score为1,代表异常视频的标签,在后续基于视频检测分值配置阈值的训练过程中,若某个图片加上该标签,则说明该视频帧被编辑过(或称被伪造过),是“假”的伪造视频,即基于图片驱动视频所得到的标签,这类标签对用户具备攻击的可能性。7. For the evaluation of multiple pixel anomaly rate sequences, the video detection score can be obtained in various ways. This application example calculates the video detection score according to formula (2) or formula (3) as follows. The video detection score score∈(0,1); where score is 0, which represents the label of normal video. In the subsequent training process based on video detection score configuration threshold, if a picture is labeled with this type of label, it means that the video frame is not Edited (or unforged), it is the "true" original video frame; the score is 1, which represents the label of the abnormal video. In the subsequent training process based on the video detection score configuration threshold, if a picture is added If this tag is added, it means that the video frame has been edited (or forged) and is a "fake" forged video, that is, a tag based on a picture-driven video, and this type of tag has the possibility of attacking users.
score=策略(R)   (2)score=strategy (R) (2)
Figure PCTCN2021104572-appb-000002
Figure PCTCN2021104572-appb-000002
其中,公式(2)中,R为多个像素异常率序列;策略可以根据不同应用场景的检测需求针对性的配置;score为视频检测分值。还可以根据score配置阈值,比如,将该score作为阈值,从而,可以基于策略直接定位出异常视频的所在位置。Among them, in formula (2), R is a sequence of multiple pixel abnormality rates; the strategy can be configured according to the detection requirements of different application scenarios; score is the video detection score. The threshold can also be configured according to the score, for example, the score is used as the threshold, so that the location of the abnormal video can be directly located based on the policy.
其中,公式(3)中,R为多个像素异常率序列;b为预设的参数;该参数可以根据不同应用场景的检测需求针对性的配置;w为权重;score为视频检测分值,可以根据多个像素异常率序列的分布,通过公式(3)的Logistic回归原理(在Logistic回归过程中,可以采用制作训练集和验证集的方法去训练,以得到用于实际应用场景推理的w参数)设计一个特征提取器,以便对视频更为精确的打分,而不是如公式(2)只通过预设的策略去打分。需要指出的是,采用Logistic回归可以等价于对视频数据流执行手工设计的卷积核及再做卷积的复杂操作。还可以根据score配置阈值,比如,将该score作为阈值,从而,基于通过训练得到的神经网络的方式去定位出异常视频的所在位置。Among them, in formula (3), R is a sequence of multiple pixel abnormality rates; b is a preset parameter; this parameter can be configured according to the detection requirements of different application scenarios; w is the weight; score is the video detection score, According to the distribution of multiple pixel abnormal rate sequences, the logistic regression principle of formula (3) can be used (in the logistic regression process, the method of making a training set and a verification set can be used for training, so as to obtain the w for practical application scene reasoning. parameters) to design a feature extractor to score the video more accurately, instead of scoring only through a preset strategy as in formula (2). It should be pointed out that the use of Logistic regression can be equivalent to the complex operation of manually designed convolution kernels and re-convolution on the video data stream. The threshold can also be configured according to the score, for example, the score is used as the threshold, so that the location of the abnormal video can be located based on the neural network obtained through training.
采用本应用示例,考虑到视频帧若被伪造过、被编辑过,则存在“不同帧画面存在部分像素完全相同的情况”,因此,可以通过合理的选择待检测区域和对整个视频进行梳理,可以找出这种编辑痕迹,并对视频帧的“真”或“假”做判断,从而快速定位出整个视频数据流中的异常视频位置。计算方便,不需要视觉处理器(GPU,Graphics Processing Unit);计算复杂度低且快速;准确度高,可解释性强,可以直接定位异常视频位置;通过设定阈值可以直接推理,不需要训练,或者训练需要的数据少。而且,通过上述公式计算视频任意序列中被编辑区域大小,并作为视频级别的判断依据,也不需要设计复杂的卷积神经网络,通过基于策略或简单训练得到的score去配置阈值,从而,用最少的成本,最快的运算即可达到很好的检测效果。Using this application example, considering that if the video frame has been forged or edited, there will be "some pixels in different frames are completely the same". Therefore, it is possible to reasonably select the area to be detected and sort out the entire video. This kind of editing traces can be found out, and the "true" or "false" of the video frame can be judged, so as to quickly locate the abnormal video position in the entire video data stream. Easy to calculate, no visual processor (GPU, Graphics Processing Unit) required; low computational complexity and fast; high accuracy, strong interpretability, can directly locate abnormal video positions; can be directly inferred by setting thresholds, no training required , or training requires less data. Moreover, the size of the edited area in any video sequence is calculated by the above formula, and it is used as the basis for judging the video level. There is no need to design a complex convolutional neural network, and the threshold is configured based on the score obtained by the strategy or simple training. The least cost and the fastest operation can achieve a good detection effect.
根据本公开的实施例,提供了一种视频检测装置,图3是根据本公开实施例的视频检测装置的组成结构示意图,如图3所示,视频检测装置300包括:目标区域检测模块301,用于对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中 不同视频帧存在部分像素完全相同的区域;异常视频查找模块302,用于查找所述视频数据流中存在所述目标区域的异常视频帧;目标视频确定模块303,用于在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。According to an embodiment of the present disclosure, a video detection apparatus is provided. FIG. 3 is a schematic diagram of the composition and structure of the video detection apparatus according to an embodiment of the present disclosure. As shown in FIG. 3 , the video detection apparatus 300 includes: a target area detection module 301 , It is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that different video frames in the video data stream exist in the same part of the pixel area; abnormal video search The module 302 is used to find the abnormal video frame of the target area in the video data stream; the target video determination module 303 is used to determine the abnormal video frame when the detection parameter corresponding to the abnormal video frame meets the threshold. The video frame is determined as the target video frame.
根据本公开的实施例,提供了一种视频检测装置,图4是根据本公开实施例的视频检测装置的组成结构示意图,如图4所示,视频检测装置400包括:目标区域检测模块401,用于对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中不同视频帧存在部分像素完全相同的区域;其中,目标区域检测模块401还包括:关键区域提取子模块4011,用于对所述视频数据流中相邻的至少两个视频帧,分别提取关键区域;比对子模块4012,用于对所述至少两个视频帧分别对应的关键区域进行像素点的特征比对,将比对得到的所述关键区域中存在所述部分像素完全相同的区域,作为所述目标区域。以及异常视频查找模块402,用于查找所述视频数据流中存在所述目标区域的异常视频帧;目标视频确定模块403,用于在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。According to an embodiment of the present disclosure, a video detection apparatus is provided. FIG. 4 is a schematic structural diagram of a video detection apparatus according to an embodiment of the present disclosure. As shown in FIG. 4 , the video detection apparatus 400 includes: a target area detection module 401 , It is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that different video frames in the video data stream have the same part of the same pixel area; wherein, the target area The region detection module 401 further includes: a key region extraction sub-module 4011, which is used for extracting key regions respectively from at least two adjacent video frames in the video data stream; The feature comparison of pixel points is performed on the key regions corresponding to the respective video frames, and the key regions obtained from the comparison have regions with the same part of the pixels as the target regions. And the abnormal video search module 402 is used to find the abnormal video frame of the target area in the video data stream; the target video determination module 403 is used for the detection parameter corresponding to the abnormal video frame in the case of meeting the threshold value, The abnormal video frame is determined as the target video frame.
一实施方式中,所述目标视频确定模块,用于对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分值;根据所述雷同分值,得到针对所述异常视频帧的像素异常率;将所述像素异常率作为所述检测参数,在所述检测参数符合阈值的情况下,将所述第一视频序列中存在的异常视频帧确定为目标视频帧。In one embodiment, the target video determination module is configured to perform pixel anomaly operation on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score; according to the similarity score , obtain the pixel abnormality rate for the abnormal video frame; take the pixel abnormality rate as the detection parameter, and determine the abnormal video frame existing in the first video sequence when the detection parameter meets the threshold is the target video frame.
一实施方式中,所述目标视频确定模块,用于对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分值;根据所述雷同分值,得到针对所述异常视频帧的像素异常率;从所述第一视频序列中选取由不同像素异常率构成的第二视频序列,并分别进行打分,得到对应的视频检测分值;将所述视频检测分值作为所述检测参数,在所述检测参数符合阈值的情况下,将所述第二视频序列中存在的异常视频帧确定为目标视频帧。In one embodiment, the target video determination module is configured to perform pixel anomaly operation on the first video sequence composed of the abnormal video frames in the video data stream to obtain a similarity score; according to the similarity score , obtain the pixel abnormality rate for the abnormal video frame; select a second video sequence composed of different pixel abnormality rates from the first video sequence, and score them respectively to obtain the corresponding video detection score; The video detection score is used as the detection parameter, and when the detection parameter meets the threshold, an abnormal video frame existing in the second video sequence is determined as a target video frame.
一实施方式中,还包括阈值配置模块,用于根据视频检测分值配置所 述阈值;其中,所述视频检测分值包括:不同像素异常率构成的第二视频序列在第一视频序列中的分值占比;所述第一视频序列为所述视频数据流中由所述异常视频帧构成的第一视频序列。In one embodiment, a threshold configuration module is further included, configured to configure the threshold according to a video detection score; wherein the video detection score includes: a second video sequence composed of different pixel abnormality rates in the first video sequence Score ratio; the first video sequence is the first video sequence composed of the abnormal video frames in the video data stream.
一实施方式中,还包括定位模块,用于根据所述阈值,定位出所述目标视频帧在所述视频数据流中的位置。In an embodiment, a positioning module is further included, configured to locate the position of the target video frame in the video data stream according to the threshold.
本公开实施例各装置中的各模块的功能可以参见上述方法中的对应描述,在此不再赘述。For the functions of each module in each device in the embodiment of the present disclosure, reference may be made to the corresponding description in the foregoing method, and details are not described herein again.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
图5是用来实现本公开实施例的视频检测方法的电子设备的框图。该电子设备可以为前述部署设备或代理设备。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或要求的本公开的实现。FIG. 5 is a block diagram of an electronic device used to implement the video detection method of an embodiment of the present disclosure. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图5所示,电子设备500包括计算单元501,其可以根据存储在只读存储器(ROM)502中的计算机程序或者从存储单元508加载到随机访问存储器(RAM)503中的计算机程序来执行各种适当的动作和处理。在RAM 503中,还可存储电子设备500操作所需的各种程序和数据。计算单元501、ROM 502以及RAM 503通过总线504彼此相连。输入输出(I/O)接口505也连接至总线504。As shown in FIG. 5 , the electronic device 500 includes a computing unit 501 that can be executed according to a computer program stored in a read only memory (ROM) 502 or a computer program loaded from a storage unit 508 into a random access memory (RAM) 503 Various appropriate actions and handling. In the RAM 503, various programs and data required for the operation of the electronic device 500 can also be stored. The computing unit 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input output (I/O) interface 505 is also connected to bus 504 .
电子设备500中的多个部件连接至I/O接口505,包括:输入单元506,例如键盘、鼠标等;输出单元507,例如各种类型的显示器、扬声器等;存储单元508,例如磁盘、光盘等;以及通信单元509,例如网卡、调制解调器、无线通信收发机等。通信单元509允许电子设备500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Various components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506, such as a keyboard, a mouse, etc.; an output unit 507, such as various types of displays, speakers, etc.; a storage unit 508, such as a magnetic disk, an optical disk etc.; and a communication unit 509, such as a network card, modem, wireless communication transceiver, and the like. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
计算单元501可以是各种具有处理和计算能力的通用和/或专用处理组 件。计算单元501的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元501执行上文所描述的各个方法和处理,例如视频检测方法。例如,在一些实施例中,视频检测方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元508。在一些实施例中,计算机程序的部分或者全部可以经由ROM 502和/或通信单元509而被载入和/或安装到电子设备500上。当计算机程序加载到RAM 503并由计算单元501执行时,可以执行上文描述的视频检测方法的一个或多个步骤。备选地,在其他实施例中,计算单元501可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行视频检测方法。 Computing unit 501 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as video detection methods. For example, in some embodiments, the video detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508 . In some embodiments, part or all of the computer program may be loaded and/or installed on electronic device 500 via ROM 502 and/or communication unit 509 . When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the video detection method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the video detection method by any other suitable means (eg, by means of firmware).
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein above may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented. The program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读 储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入、或者触觉输入来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的 结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present disclosure can be executed in parallel, sequentially, or in different orders. As long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements, and improvements made within the spirit and principles of the present disclosure should be included within the protection scope of the present disclosure.

Claims (15)

  1. 一种视频检测方法,包括:A video detection method, comprising:
    对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中不同视频帧存在部分像素完全相同的区域;Detecting the video frame in the video data stream to obtain a target area in the video frame, and the target area is used to represent that different video frames in the video data stream exist in some areas with completely identical pixels;
    查找所述视频数据流中存在所述目标区域的异常视频帧;Find abnormal video frames in the target area in the video data stream;
    在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。In the case that the detection parameter corresponding to the abnormal video frame meets the threshold, the abnormal video frame is determined as the target video frame.
  2. 根据权利要求1所述的方法,其中,所述对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,包括:The method according to claim 1, wherein the detecting a video frame in a video data stream to obtain a target area in the video frame comprises:
    对所述视频数据流中相邻的至少两个视频帧,分别提取关键区域;Extracting key regions respectively for at least two adjacent video frames in the video data stream;
    对所述至少两个视频帧分别对应的关键区域进行像素点的特征比对,将比对得到的所述关键区域中存在所述部分像素完全相同的区域,作为所述目标区域。The feature comparison of the pixel points is performed on the key regions corresponding to the at least two video frames respectively, and the key regions obtained from the comparison have regions with the same part of the pixels as the target regions.
  3. 根据权利要求1或2所述的方法,其中,所述在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧,包括:The method according to claim 1 or 2, wherein, when the detection parameter corresponding to the abnormal video frame meets a threshold, determining the abnormal video frame as the target video frame, comprising:
    对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分值;Perform pixel abnormal operation on the first video sequence formed by the abnormal video frame in the video data stream to obtain the similarity score;
    根据所述雷同分值,得到针对所述异常视频帧的像素异常率;According to the similarity score, obtain the pixel abnormality rate for the abnormal video frame;
    将所述像素异常率作为所述检测参数,在所述检测参数符合阈值的情况下,将所述第一视频序列中存在的异常视频帧确定为目标视频帧。The pixel abnormality rate is used as the detection parameter, and when the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame.
  4. 根据权利要求1或2所述的方法,其中,所述在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧,包括:The method according to claim 1 or 2, wherein, when the detection parameter corresponding to the abnormal video frame meets a threshold, determining the abnormal video frame as the target video frame, comprising:
    对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分值;Perform pixel abnormal operation on the first video sequence formed by the abnormal video frame in the video data stream to obtain the similarity score;
    根据所述雷同分值,得到针对所述异常视频帧的像素异常率;According to the similarity score, obtain the pixel abnormality rate for the abnormal video frame;
    从所述第一视频序列中选取由不同像素异常率构成的第二视频序列, 并分别进行打分,得到对应的视频检测分值;Selecting a second video sequence consisting of different pixel anomaly rates from the first video sequence, and scoring respectively, to obtain a corresponding video detection score;
    将所述视频检测分值作为所述检测参数,在所述检测参数符合阈值的情况下,将所述第二视频序列中存在的异常视频帧确定为目标视频帧。The video detection score is used as the detection parameter, and if the detection parameter meets the threshold, an abnormal video frame existing in the second video sequence is determined as a target video frame.
  5. 根据权利要求1或2所述的方法,还包括:The method according to claim 1 or 2, further comprising:
    根据视频检测分值配置所述阈值;Configure the threshold according to the video detection score;
    其中,所述视频检测分值包括:不同像素异常率构成的第二视频序列在第一视频序列中的分值占比;所述第一视频序列为所述视频数据流中由所述异常视频帧构成的第一视频序列。Wherein, the video detection score includes: the score ratio of the second video sequence composed of different pixel abnormality rates in the first video sequence; the first video sequence is the video data stream composed of the abnormal video A first video sequence of frames.
  6. 根据权利要求5所述的方法,还包括:The method of claim 5, further comprising:
    根据所述阈值,定位出所述目标视频帧在所述视频数据流中的位置。According to the threshold, the position of the target video frame in the video data stream is located.
  7. 一种视频检测装置,所述装置包括:A video detection device, the device includes:
    目标区域检测模块,用于对视频数据流中的视频帧进行检测,得到所述视频帧中的目标区域,所述目标区域用于表征所述视频数据流中不同视频帧存在部分像素完全相同的区域;The target area detection module is used to detect the video frame in the video data stream, and obtain the target area in the video frame, and the target area is used to represent that there are some identical pixels in different video frames in the video data stream. area;
    异常视频查找模块,用于查找所述视频数据流中存在所述目标区域的异常视频帧;An abnormal video search module for finding abnormal video frames in the target area in the video data stream;
    目标视频确定模块,用于在所述异常视频帧对应的检测参数符合阈值的情况下,将所述异常视频帧确定为目标视频帧。A target video determination module, configured to determine the abnormal video frame as a target video frame when the detection parameter corresponding to the abnormal video frame meets a threshold.
  8. 根据权利要求7所述的装置,其中,所述目标区域检测模块,用于:The device according to claim 7, wherein the target area detection module is used for:
    对所述视频数据流中相邻的至少两个视频帧,分别提取关键区域;Extracting key regions respectively for at least two adjacent video frames in the video data stream;
    对所述至少两个视频帧分别对应的关键区域进行像素点的特征比对,将比对得到的所述关键区域中存在所述部分像素完全相同的区域,作为所述目标区域。The feature comparison of the pixel points is performed on the key regions corresponding to the at least two video frames respectively, and the key regions obtained from the comparison have regions with the same part of the pixels as the target regions.
  9. 根据权利要求7或8所述的装置,其中,所述目标视频确定模块,用于:The device according to claim 7 or 8, wherein the target video determination module is used for:
    对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分值;Perform pixel abnormal operation on the first video sequence formed by the abnormal video frame in the video data stream to obtain the similarity score;
    根据所述雷同分值,得到针对所述异常视频帧的像素异常率;According to the similarity score, obtain the pixel abnormality rate for the abnormal video frame;
    将所述像素异常率作为所述检测参数,在所述检测参数符合阈值的情 况下,将所述第一视频序列中存在的异常视频帧确定为目标视频帧。The pixel abnormality rate is used as the detection parameter, and in the case that the detection parameter meets the threshold, the abnormal video frame existing in the first video sequence is determined as the target video frame.
  10. 根据权利要求7或8所述的装置,其中,所述目标视频确定模块,用于:The device according to claim 7 or 8, wherein the target video determination module is used for:
    对所述视频数据流中由所述异常视频帧构成的第一视频序列进行像素异常的运算,得到雷同分值;Perform pixel abnormal operation on the first video sequence formed by the abnormal video frame in the video data stream to obtain the similarity score;
    根据所述雷同分值,得到针对所述异常视频帧的像素异常率;According to the similarity score, obtain the pixel abnormality rate for the abnormal video frame;
    从所述第一视频序列中选取由不同像素异常率构成的第二视频序列,并分别进行打分,得到对应的视频检测分值;Select a second video sequence composed of different pixel abnormality rates from the first video sequence, and score them respectively to obtain corresponding video detection scores;
    将所述视频检测分值作为所述检测参数,在所述检测参数符合阈值的情况下,将所述第二视频序列中存在的异常视频帧确定为目标视频帧。The video detection score is used as the detection parameter, and if the detection parameter meets the threshold, an abnormal video frame existing in the second video sequence is determined as a target video frame.
  11. 根据权利要求7或8所述的装置,还包括阈值配置模块,用于:The apparatus according to claim 7 or 8, further comprising a threshold configuration module for:
    根据视频检测分值配置所述阈值;Configure the threshold according to the video detection score;
    其中,所述视频检测分值包括:不同像素异常率构成的第二视频序列在第一视频序列中的分值占比;所述第一视频序列为所述视频数据流中由所述异常视频帧构成的第一视频序列。Wherein, the video detection score includes: the score ratio of the second video sequence composed of different pixel abnormality rates in the first video sequence; the first video sequence is the video data stream composed of the abnormal video A first video sequence of frames.
  12. 根据权利要求11所述的装置,还包括定位模块,用于:The apparatus of claim 11, further comprising a positioning module for:
    根据所述阈值,定位出所述目标视频帧在所述视频数据流中的位置。According to the threshold, the position of the target video frame in the video data stream is located.
  13. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-6 Methods.
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使计算机执行权利要求1-6中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.
  15. 一种计算机程序产品,包括计算机指令,该计算机指令被处理器执行时实现权利要求1-6中任一项所述的方法。A computer program product comprising computer instructions which, when executed by a processor, implement the method of any of claims 1-6.
PCT/CN2021/104572 2021-03-12 2021-07-05 Video detection method and apparatus, electronic device, and storage medium WO2022188315A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023519078A JP2023543015A (en) 2021-03-12 2021-07-05 Video detection methods, devices, electronic devices and storage media
KR1020237009299A KR20230045098A (en) 2021-03-12 2021-07-05 Video detection method, device, electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110272132.XA CN112883902B (en) 2021-03-12 2021-03-12 Video detection method and device, electronic equipment and storage medium
CN202110272132.X 2021-03-12

Publications (1)

Publication Number Publication Date
WO2022188315A1 true WO2022188315A1 (en) 2022-09-15

Family

ID=76042440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/104572 WO2022188315A1 (en) 2021-03-12 2021-07-05 Video detection method and apparatus, electronic device, and storage medium

Country Status (4)

Country Link
JP (1) JP2023543015A (en)
KR (1) KR20230045098A (en)
CN (1) CN112883902B (en)
WO (1) WO2022188315A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523451A (en) * 2023-11-20 2024-02-06 广西桂冠电力股份有限公司 Video data analysis system and method based on intelligent security technology

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883902B (en) * 2021-03-12 2023-01-24 百度在线网络技术(北京)有限公司 Video detection method and device, electronic equipment and storage medium
CN113450125A (en) * 2021-07-06 2021-09-28 北京市商汤科技开发有限公司 Method and device for generating traceable production data, electronic equipment and storage medium
CN115797190A (en) * 2021-09-09 2023-03-14 北京字跳网络技术有限公司 Video processing method, video processing apparatus, electronic device, video processing medium, and program product
CN114466181A (en) * 2021-12-29 2022-05-10 沈阳中科创达软件有限公司 Video anomaly detection method, device, equipment and system
CN116132084A (en) * 2022-09-20 2023-05-16 马上消费金融股份有限公司 Video stream processing method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018032270A1 (en) * 2016-08-15 2018-02-22 Qualcomm Incorporated Low complexity tamper detection in video analytics
CN111444881A (en) * 2020-04-13 2020-07-24 中国人民解放军国防科技大学 Fake face video detection method and device
CN111539272A (en) * 2020-04-10 2020-08-14 上海交通大学 Method and system for passively detecting AI face changing video based on joint features
US20200267404A1 (en) * 2020-05-04 2020-08-20 Intel Corportation Detection of video tampering
CN111652875A (en) * 2020-06-05 2020-09-11 西安电子科技大学 Video counterfeiting detection method, system, storage medium and video monitoring terminal
CN112001429A (en) * 2020-08-06 2020-11-27 中山大学 Depth forgery video detection method based on texture features
CN112883902A (en) * 2021-03-12 2021-06-01 百度在线网络技术(北京)有限公司 Video detection method and device, electronic equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268563B (en) * 2014-09-15 2017-05-17 合肥工业大学 Video abstraction method based on abnormal behavior detection
US10529077B2 (en) * 2017-12-19 2020-01-07 Canon Kabushiki Kaisha System and method for detecting interaction
CN111767760A (en) * 2019-04-01 2020-10-13 北京市商汤科技开发有限公司 Living body detection method and apparatus, electronic device, and storage medium
CN111353395B (en) * 2020-02-19 2023-07-28 南京信息工程大学 Face-changing video detection method based on long-term and short-term memory network
CN111783608B (en) * 2020-06-24 2024-03-19 南京烽火星空通信发展有限公司 Face-changing video detection method
CN111753762B (en) * 2020-06-28 2024-03-15 北京百度网讯科技有限公司 Method, device, equipment and storage medium for identifying key identification in video
CN111783644B (en) * 2020-06-30 2023-07-14 百度在线网络技术(北京)有限公司 Detection method, detection device, detection equipment and computer storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018032270A1 (en) * 2016-08-15 2018-02-22 Qualcomm Incorporated Low complexity tamper detection in video analytics
CN111539272A (en) * 2020-04-10 2020-08-14 上海交通大学 Method and system for passively detecting AI face changing video based on joint features
CN111444881A (en) * 2020-04-13 2020-07-24 中国人民解放军国防科技大学 Fake face video detection method and device
US20200267404A1 (en) * 2020-05-04 2020-08-20 Intel Corportation Detection of video tampering
CN111652875A (en) * 2020-06-05 2020-09-11 西安电子科技大学 Video counterfeiting detection method, system, storage medium and video monitoring terminal
CN112001429A (en) * 2020-08-06 2020-11-27 中山大学 Depth forgery video detection method based on texture features
CN112883902A (en) * 2021-03-12 2021-06-01 百度在线网络技术(北京)有限公司 Video detection method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523451A (en) * 2023-11-20 2024-02-06 广西桂冠电力股份有限公司 Video data analysis system and method based on intelligent security technology

Also Published As

Publication number Publication date
JP2023543015A (en) 2023-10-12
CN112883902A (en) 2021-06-01
CN112883902B (en) 2023-01-24
KR20230045098A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
WO2022188315A1 (en) Video detection method and apparatus, electronic device, and storage medium
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
US20160239520A1 (en) Biometric matching engine
WO2020228515A1 (en) Fake face recognition method, apparatus and computer-readable storage medium
WO2022213718A1 (en) Sample image increment method, image detection model training method, and image detection method
CN112598643A (en) Depth counterfeit image detection and model training method, device, equipment and medium
US20220262163A1 (en) Method of face anti-spoofing, device, and storage medium
CN113221768A (en) Recognition model training method, recognition method, device, equipment and storage medium
JP2015187759A (en) Image searching device and image searching method
EP4080470A2 (en) Method and apparatus for detecting living face
CN113272816A (en) Whole-person correlation for face screening
CN112651459A (en) Defense method, device, equipment and storage medium for confrontation sample of deep learning image
CN115049954A (en) Target identification method, device, electronic equipment and medium
CN113869253A (en) Living body detection method, living body training device, electronic apparatus, and medium
CN111862030B (en) Face synthetic image detection method and device, electronic equipment and storage medium
Zhang et al. Face occlusion detection using cascaded convolutional neural network
Luo et al. Multi-scale face detection based on convolutional neural network
CN108122011B (en) Target tracking method and system based on multiple invariance mixtures
CN113642428B (en) Face living body detection method and device, electronic equipment and storage medium
CN114445898B (en) Face living body detection method, device, equipment, storage medium and program product
CN114067394A (en) Face living body detection method and device, electronic equipment and storage medium
Srikanth et al. Contactless object identification algorithm for the visually impaired using efficientdet
Kaur et al. Improved Facial Biometric Authentication Using MobileNetV2
CN111079704A (en) Face recognition method and device based on quantum computation
CN115205939B (en) Training method and device for human face living body detection model, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929785

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20237009299

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2023519078

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929785

Country of ref document: EP

Kind code of ref document: A1