CN112800919A - Method, device and equipment for detecting target type video and storage medium - Google Patents

Method, device and equipment for detecting target type video and storage medium Download PDF

Info

Publication number
CN112800919A
CN112800919A CN202110084414.7A CN202110084414A CN112800919A CN 112800919 A CN112800919 A CN 112800919A CN 202110084414 A CN202110084414 A CN 202110084414A CN 112800919 A CN112800919 A CN 112800919A
Authority
CN
China
Prior art keywords
video
target
detected
detection result
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110084414.7A
Other languages
Chinese (zh)
Inventor
付志康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110084414.7A priority Critical patent/CN112800919A/en
Publication of CN112800919A publication Critical patent/CN112800919A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The disclosure discloses a method, a device, equipment and a storage medium for detecting a target type video, and relates to the field of artificial intelligence such as deep learning and image processing. One specific implementation of the method for detecting the target type video includes: extracting at least one image frame from a video to be detected; inputting at least one image frame into a pre-trained target image classification model to obtain a classification result of a video to be detected; in response to the fact that the classification result of the video to be detected is not the target type video, extracting a digital audio file from the video to be detected; and inputting the digital audio file into the first target object classification model and/or inputting at least one image frame into the second target object classification model to obtain a second modal detection result of the video to be detected, so that the recall rate and the accuracy of the target type video detection technology are improved.

Description

Method, device and equipment for detecting target type video and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence such as deep learning and image processing, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a target type video.
Background
Short videos are now gradually becoming entertainment modes for the masses, and people can freely upload and download short videos in application programs. But forbidden videos (such as pornography videos, violence videos and the like) exist in the short videos uploaded by part of users.
Disclosure of Invention
The disclosure provides a method, a device, equipment and a storage medium for detecting a target type video.
According to a first aspect of the present disclosure, there is provided a method for detecting a target type video, including: extracting at least one image frame from a video to be detected; inputting at least one image frame into a pre-trained target image classification model to obtain a classification result of a video to be detected; in response to the fact that the classification result of the video to be detected is not the target type video, extracting a digital audio file from the video to be detected; and inputting the digital audio file into a first target object classification model and/or inputting at least one image frame into a second target object classification model to obtain a second modal detection result of the video to be detected, wherein the first target object classification model is used for determining whether the digital audio file contains a first target object, the second target object classification model is used for determining whether the at least one image frame contains a second target object, and the first target object and the second target object are used for representing a target type video.
According to a second aspect of the present disclosure, there is provided a target type detection video apparatus including: a first extraction module configured to extract at least one image frame from a video to be detected; the first classification module is configured to input at least one image frame into a pre-trained target image classification model to obtain a classification result of the video to be detected; a second extraction module configured to extract a digital audio file from the video to be detected in response to the classification result of the video to be detected not being the target type video; and the second classification module is configured to input the digital audio file into a first target object classification model and/or input at least one image frame into a second target object classification model to obtain a second modal detection result of the video to be detected, wherein the first target object classification model is used for determining whether the digital audio file contains a first target object, the second target object classification model is used for determining whether the at least one image frame contains a second target object, and the first target object and the second target object are used for representing the target type video.
According to a third aspect of the present disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium is presented storing computer instructions for causing a computer to perform the method as described in any one of the implementations of the first aspect.
According to a fifth aspect of the present disclosure, a computer program product is presented, comprising a computer program which, when executed by a processor, performs the method as described in any of the implementations of the first aspect.
According to the method, the device, the equipment and the storage medium for detecting the target type video, at least one image frame is extracted from a video to be detected; then inputting at least one image frame into a pre-trained target image classification model to obtain a classification result of the video to be detected; then, in response to the fact that the classification result of the video to be detected is not the target type video, extracting a digital audio file from the video to be detected; and finally, inputting the digital audio file into the first target object classification model and/or inputting at least one image frame into the second target object classification model to obtain a second modal detection result of the video to be detected, so that the recall rate and the accuracy rate of the target type video detection technology are improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings. The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a schematic flow chart diagram illustrating one embodiment of a method for detecting target type video in accordance with the present application;
FIG. 3 is a schematic flow chart diagram illustrating another embodiment of a method for detecting target type video in accordance with the present application;
FIG. 4 is a schematic diagram of an application scenario of an embodiment of a method for detecting a target type video according to the present application;
FIG. 5 is a block diagram illustrating an embodiment of a target type video apparatus of the present application;
fig. 6 is a block diagram of an electronic device for implementing the method for detecting a target type video according to the embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the detected object type video method or detected object type video apparatus of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include a terminal device 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
Terminal device 101 may interact with server 103 through network 102. Various types of videos to be detected can be uploaded in the terminal device 101, including but not limited to legal videos, prohibited videos, and the like.
The server 103 may provide various services, for example, the server 103 may perform processing such as analysis on various types of data of the video to be detected and the like acquired from the terminal device 101, and generate a processing result (for example, obtain a second-modality detection result of the video to be detected).
The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for detecting the target type video provided in the embodiment of the present application is generally executed by the server 103, and accordingly, the target type video detecting device is generally disposed in the server 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of detecting target type video in accordance with the present application is shown. The method comprises the following steps:
step 201, at least one image frame is extracted from a video to be detected.
In the present embodiment, an execution subject (for example, the server 103 shown in fig. 1) of the method of detecting the target type video may extract at least one image frame from the video to be detected.
The video frames of the video to be detected can be intercepted based on a preset time interval. Illustratively, the frame-cutting process may be set to be performed every three seconds, or other time interval. At least one image frame can be extracted from the video to be detected by the frame-cut processing.
Step 202, inputting at least one image frame into a pre-trained target image classification model to obtain a classification result of the video to be detected.
In this embodiment, the executing body may input at least one image frame to a pre-trained target image classification model to obtain a classification result of the video to be detected.
The target image classification model can adopt a neural network classification model to judge whether each frame of image is a target image or not, and then a classification result of each frame of image is obtained.
The target image refers to a target type image, such as an image containing prohibited pictures (e.g., pornography, violence, etc.). The target image classification model can be obtained by training through the following steps: firstly, obtaining a plurality of training samples, wherein each training sample comprises a sample image and a label of whether the sample image belongs to a target type image; and then taking the sample image in each training sample as the input of the target image classification model, taking the label of the sample image as the expected output of the target image classification model, and training to obtain the required target image classification model.
The step of training may be an initialized target image classification model, the initialized target image classification model may be an untrained target image classification model or an untrained target image classification model, each layer of the initialized target image classification model may be provided with initial parameters, and the parameters may be continuously adjusted in the training process of the target image classification model. The initialized target image classification model may be various types of untrained or untrained artificial neural networks or a model obtained by combining various types of untrained or untrained artificial neural networks, for example, the initialized target image classification model may be an untrained convolutional neural network, an untrained cyclic neural network, or a model obtained by combining an untrained convolutional neural network, an untrained cyclic neural network, and an untrained full-link layer. Alternatively, the target image classification model may employ a support vector machine model. A Support Vector Machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning manner, and a decision boundary of the SVM is a maximum edge distance hyperplane for solving a learning sample.
The execution main body can aggregate the classification result of each frame of image to obtain the classification result of the video to be detected. For example, if the classification result of any frame of image is that the frame of image is the target type image, the classification result of the video to be detected is that the video to be detected is the target type video. For another example, if more than half of the frames (including half) of the images in the classification results of all the frame images are target type images, the classification result of the video to be detected is that the video to be detected is the target type video.
And step 203, in response to that the classification result of the video to be detected is not the target type video, extracting a digital audio file from the video to be detected.
In this embodiment, the executing body may execute, in response to the classification result of the video to be detected not being the target type video, extracting the digital audio file from the video to be detected.
If the classification result of the video to be detected is that the video to be detected is not the target type video, executing step 203; if the classification result of the video to be detected is that the video to be detected is the target type video, step 203 is not executed. The target type video may be a prohibited video, such as a pornographic video, a violent video, etc., among others.
Wherein, the audio track can be extracted from the video to be detected and saved as a digital audio file (e.g. wav file, etc.) by using an audio information extraction tool (e.g. Music Extractor, etc.).
Step 204, inputting the digital audio file into the first target object classification model and/or inputting at least one image frame into the second target object classification model to obtain a second modal detection result of the video to be detected.
In this embodiment, the executing body may input a digital audio file to the first target object classification model and/or input at least one image frame to the second target object classification model, so as to obtain a second modal detection result of the video to be detected.
Wherein the first target object classification model is used to determine whether the digital audio file contains a first target object, the second target object classification model is used to determine whether the at least one image frame contains a second target object, and the first target object and the second target object are used to characterize a target type video.
Besides the target type image, the target type video may also include other target objects that can be used to characterize that the video belongs to the target type video. Taking a target type video as an example of a pornographic video, the pornographic video can comprise pornographic actions, pornographic organs, pornographic characters, pornographic voice and other objects which can represent that the video belongs to the pornographic video besides pornographic images. The target object can be divided into a first target object and a second target object according to the carrier form of the target object, the carrier form of the first target object can be a digital audio file, and the carrier form of the second target object can be an image. Illustratively, taking a target type video as a pornographic video as an example, the first target object may be pornographic voice, and the second target object may be pornographic motion, pornographic organs, pornographic characters, and the like.
The first target object classification model may adopt a neural network classification model to determine whether the digital audio file contains the first target object, so as to obtain a classification result of the digital audio file. The second target object classification model may also adopt a neural network classification model to determine whether each frame of image includes the second target object, so as to obtain a classification result of each frame of image.
The second mode detection result of the video to be detected can be determined according to the classification result of the digital audio file or the classification result of each frame of image. For example, if the classification result of the digital audio file indicates that the digital audio file contains the first target object, the second mode detection result indicates that the video to be detected is the target type video; and if the classification result of any frame of image is that the image contains a second target object, the second mode detection result is that the video to be detected is the target type video.
And determining a second mode detection result of the video to be detected according to the classification result of the digital audio file and the classification result of each frame of image. For example, the classification result of the digital audio file and the classification result of the at least one frame of image may be aggregated to obtain the second mode detection result of the video to be detected. Illustratively, the aggregation manner includes, but is not limited to, if only one classification result is that a target object (a first target object or a second target object) is included, the second modality detection result is that the video to be detected is a target type video.
The method for detecting the target type video, provided by the embodiment of the application, can effectively detect the target type video in a large amount of videos, has high recall rate and accuracy, replaces manual review, and saves manpower.
With further reference to fig. 3, shown is a flow diagram of another embodiment of a method of detecting target type video, the method comprising the steps of:
step 301, at least one image frame is extracted from a video to be detected.
Step 301 is substantially the same as step 201, and therefore will not be described again.
Step 302, inputting at least one image frame into a pre-trained target image classification model to obtain a classification result of the video to be detected.
Step 302 is substantially the same as step 202, and therefore is not described in detail.
And 303, in response to the fact that the classification result of the video to be detected is not the target type video, recognizing characters of at least one image frame by adopting an optical character recognition technology.
Among them, Optical Character Recognition (OCR) refers to a process in which an electronic device checks characters in an image, determines a shape thereof by detecting dark and light patterns, and then translates the shape into computer characters by a Character Recognition method. If the classification result of the video to be detected is that the video to be detected is not the target type video, at least one image frame can be input into the OCR model, and characters in the image can be recognized.
And step 304, matching the recognized characters with a preset target character dictionary.
The target characters refer to characters which can be used for representing that the video belongs to a target type video, and the target characters can be pornographic characters by taking the target type video as a pornographic video as an example. The target word dictionary refers to a set of all target words collected by experience. The recognized characters in each frame image may be matched with the target character dictionary to determine whether the recognized characters belong to the target characters. For example, similarity matching may be performed between the recognized characters in the image and the target characters in the target character dictionary, and when any one of the target characters in the target character dictionary and the recognized characters reaches a preset similarity threshold (e.g., 90%), the recognized characters belong to the target characters.
And 305, obtaining a second mode detection result of the video to be detected according to the matching result.
The characters identified in each frame of image can be matched with a preset target character dictionary to obtain a matching result of each frame of image. And then aggregating the matching results of each frame of image to obtain a second mode detection result of the video to be detected. For example, if the matching result of any frame of image is that the frame of image contains the target characters, the second modality detection result is that the video to be detected is the target type video. For another example, if more than half of the frames (including half of the frames) of the matching results of all the frame images contain target characters, the second mode detection result is that the video to be detected is the target type video.
In some optional implementations of this embodiment, the second target object is a target organ, and the step 204 includes: and inputting at least one image frame into a pre-trained target organ detection model to obtain a second mode detection result of the video to be detected.
The target organ detection model can adopt a neural network classification model to judge whether each frame of image comprises a target organ or not, and then obtains the classification result of each frame of image.
The target organ may be an organ used for representing that the video belongs to a target type video, and taking the target type video as a pornographic video as an example, the target organ may include male genitalia, female genitalia, and the like. The target organ detection model can be obtained by training through the following steps: firstly, obtaining a plurality of training samples, wherein each training sample comprises a sample image and whether the sample image contains a label of a target organ; and then taking the sample image in each training sample as the input of the target organ detection model, taking the label of the sample image as the expected output of the target organ detection model, and training to obtain the required target organ detection model.
The execution main body can aggregate the classification result of each frame of image to obtain a second mode detection result of the video to be detected. For example, if the classification result of any frame of image is that the frame of image includes a target organ, the second modality detection result is that the video to be detected is a target type video. For another example, if more than half of the frames (including half) of the images in the classification results of all the frame images include the target organ, the second modality detection result is that the video to be detected is the target type video.
In some optional implementations of this embodiment, the second target object is a target action, and the step 204 includes: and inputting at least one image frame into a pre-trained target action detection model to obtain a second mode detection result of the video to be detected.
The target action detection model can adopt a neural network classification model to judge whether each frame of image comprises a target action or not, and then a classification result of each frame of image is obtained.
The target action may be an action used for representing that the video belongs to a target type video, and taking the target type video as a pornographic video as an example, the target action may include a sexual action, a sexual feeling action, and the like. The target motion detection model can be obtained by training through the following steps: firstly, obtaining a plurality of training samples, wherein each training sample comprises a sample image and a label of whether the sample image contains a target action; and then, taking the sample image in each training sample as the input of the target motion detection model, taking the label of the sample image as the expected output of the target motion detection model, and training to obtain the required target motion detection model.
The execution main body can aggregate the classification result of each frame of image to obtain a second mode detection result of the video to be detected. For example, if the classification result of any frame of image is that the frame of image includes a target motion, the second modality detection result is that the video to be detected is a target type video. For another example, if more than half of the frames (including half of the frames) of the classification results of all the frame images include the target motion, the second modality detection result is that the video to be detected is the target type video.
In some optional implementation manners of this embodiment, the first target object is a target voice, and the step 204 includes: and inputting the digital audio file into a pre-trained target voice detection model to obtain a second modal detection result of the video to be detected.
The target voice detection model can adopt a neural network classification model to judge whether the digital audio files at the preset time interval include the target voice or not, and then obtain the classification result of the digital audio files.
The target voice may be a voice used for representing that the video belongs to a target type video, and taking the target type video as a pornographic video as an example, the target voice may include a pornographic voice such as tussimus. The target voice detection model can be obtained by training the following steps: firstly, obtaining a plurality of training samples, wherein each training sample comprises a digital audio file with a preset time interval and a label indicating whether the digital audio file contains target voice; and then, taking the digital audio file with the preset time interval in each training sample as the input of the target voice detection model, taking the label of the digital audio file as the expected output of the target voice detection model, and training to obtain the required target voice detection model.
The execution main body can aggregate a plurality of digital audio file classification results at preset time intervals to obtain a second mode detection result of the video to be detected. For example, if the classification result of the digital audio file of any segment is that the digital audio file contains the target voice, the second mode detection result is that the video to be detected is the target type video. For another example, if more than half (including half) of the digital audio files in the classification results of the digital audio files of all the segments include the target voice, the second mode detection result indicates that the video to be detected is the target type video.
In some optional implementations of this embodiment, the second target object includes a target text, a target organ, and a target action, the first target object includes a target voice, and the step 204 includes:
step 2041, inputting at least one image frame to the target character classification model, and obtaining a third modal detection result of the video to be detected.
The characters identified in each frame of image can be matched with a preset target character dictionary to obtain a matching result of each frame of image. And then aggregating the matching results of each frame of image to obtain a third mode detection result of the video to be detected. For example, if the matching result of any frame of image is that the frame of image contains the target characters, the third modality detection result is that the video to be detected is the target type video. For another example, if more than half of the frames (including half of the frames) of the matching results of all the frame images contain target characters, the third modality detection result is that the video to be detected is the target type video.
Step 2042, inputting at least one image frame to the pre-trained target organ detection model to obtain a fourth modal detection result of the video to be detected.
The target organ detection model can adopt a neural network classification model to judge whether each frame of image comprises a target organ or not, and then obtains the classification result of each frame of image.
The execution main body can aggregate the classification result of each frame of image to obtain a fourth modal detection result of the video to be detected. For example, if the classification result of any frame of image is that the frame of image includes a target organ, the fourth modality detection result is that the video to be detected is a target type video. For another example, if more than half of the frames (including half) of the images in the classification results of all the frame images include the target organ, the fourth modality detection result is that the video to be detected is the target type video.
Step 2043, inputting at least one image frame to the pre-trained target motion detection model to obtain a fifth modal detection result of the video to be detected.
The target action detection model can adopt a neural network classification model to judge whether each frame of image comprises a target action or not, and then a classification result of each frame of image is obtained.
The execution main body can aggregate the classification result of each frame of image to obtain a fifth modal detection result of the video to be detected. For example, if the classification result of any frame of image is that the frame of image includes a target motion, the fourth modality detection result is that the video to be detected is a target type video. For another example, if more than half of the frames (including half of the frames) of the classification results of all the frame images include the target motion, the fifth modality detection result is that the video to be detected is the target type video.
And 2044, inputting the digital audio file to a pre-trained target voice detection model to obtain a sixth modal detection result of the video to be detected.
The target voice detection model can adopt a neural network classification model to judge whether the digital audio files at the preset time interval include the target voice or not, and then obtain the classification result of the digital audio files.
The execution main body can aggregate a plurality of digital audio file classification results at preset time intervals to obtain a sixth modal detection result of the video to be detected. For example, if the classification result of the digital audio file of any segment is that the digital audio file contains the target voice, the sixth modality detection result is that the video to be detected is the target type video. For another example, if more than half (including half) of the digital audio files in the classification results of the digital audio files of all the segments include the target voice, the sixth modality detection result indicates that the video to be detected is the target type video.
Step 2045, aggregating the third modality detection result, the fourth modality detection result, the fifth modality detection result and the sixth modality detection result of the video to be detected to obtain the second modality detection result of the video to be detected.
The third modality detection result, the fourth modality detection result, the fifth modality detection result, and the sixth modality detection result may be combined into the second modality detection result, and the combination manner includes but is not limited to: if any one of the third modal detection result, the fourth modal detection result, the fifth modal detection result and the sixth modal detection result is that the video to be detected is the target type video, the second modal detection result of the video to be detected is that the video to be detected is the target type video; and if more than half of the third, fourth, fifth and sixth modal detection results are the target type video, the second modal detection result of the video to be detected is the target type video, and the like.
For ease of understanding, fig. 4 shows a schematic application scenario of an embodiment of a method of detecting a target type video according to the present application.
As shown in fig. 4, the process of judging whether the video is a pornographic video includes:
(1) extracting audio and frame cutting from the video to obtain wav file and N images
(2) And inputting the video frames into a pornographic image classification model, judging whether each frame image is a pornographic image or not by adopting a neural network classification model to obtain N results, and then aggregating the N results to obtain a result A of the whole video.
(3) Inputting the video frame into a pornographic organ detection model, wherein the pornographic organs are as follows: male genitalia, female genitalia, etc. The pornographic organ model adopts a neural network detection model to judge whether pornographic organs exist in each frame of image or not to obtain N results, and then the N results are aggregated to obtain a result B of the whole video.
(5) Inputting the video frames into an OCR model, identifying characters in the images, matching the characters with a dictionary containing pornographic characters, judging whether each frame of image contains pornographic characters or not to obtain N results, and then aggregating the N results to obtain a result C of the whole video.
(4) And extracting voice characteristics from the wav file, inputting the pornographic voice classification model, and obtaining a result D of the whole video by adopting the neural network voice classification model for the pornographic voice classification model.
(6) And inputting the video frames into a pornographic action classification model, wherein the pornographic action classification model adopts a neural network classification model, and judges whether each frame image is pornographic action or not to obtain a result E of the whole video. The pornographic actions include: sexual intercourse, sexual feeling, etc.
(7) The five results are: A. b, C, D, E are merged into a final result in a manner that includes, but is not limited to: if only one result is pornographic, the whole video is judged to be the pornographic video, and if the five results are normal, the whole video is judged to be the non-pornographic video.
With further reference to fig. 5, as an implementation of the method shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for detecting a target type video, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 5, the detection target type video apparatus 500 of the present embodiment may include: a first extraction module 501, a first classification module 502, a second extraction module 503, and a second classification module 504. The first extraction module 501 is configured to extract at least one image frame from a video to be detected; a first classification module 502 configured to input at least one image frame into a pre-trained target image classification model to obtain a classification result of a video to be detected; a second extraction module 503 configured to extract a digital audio file from the video to be detected in response to the classification result of the video to be detected not being the target type video; a second classification module 504 configured to input the digital audio file to a first target object classification model and/or input the at least one image frame to a second target object classification model, to obtain a second modal detection result of the video to be detected, wherein the first target object classification model is used for determining whether the digital audio file contains a first target object, the second target object classification model is used for determining whether the at least one image frame contains a second target object, and the first target object and the second target object are used for representing a target type video.
In the present embodiment, in the detection target type video apparatus 500: the detailed processing of the first extracting module 501, the first classifying module 502, the second extracting module 503 and the second classifying module 504 and the technical effects thereof can refer to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, and are not repeated herein.
In some optional implementations of this embodiment, the second target object is a target text, and the second classification module 504 is further configured to: recognizing characters of at least one image frame by adopting an optical character recognition technology; matching the recognized characters with a preset target character dictionary; and obtaining a second mode detection result of the video to be detected according to the matching result.
In some optional implementations of this embodiment, the second target object is a target organ, and the second classification module 504 is further configured to: and inputting at least one image frame into a pre-trained target organ detection model to obtain a second mode detection result of the video to be detected.
In some optional implementations of this embodiment, the second target object is a target action, and the second classification module 504 is further configured to: and inputting at least one image frame into a pre-trained target action detection model to obtain a second mode detection result of the video to be detected.
In some optional implementations of this embodiment, the first target object is target speech, and the second classification module 504 is further configured to: and inputting the digital audio file into a pre-trained target voice detection model to obtain a second modal detection result of the video to be detected.
In some optional implementations of this embodiment, the second target object includes a target text, a target organ, a target action, the first target object includes a target voice, and the second classification module 504 is further configured to: inputting at least one image frame into a target character classification model to obtain a third modal detection result of the video to be detected; inputting at least one image frame into a pre-trained target organ detection model to obtain a fourth modal detection result of the video to be detected; inputting at least one image frame into a pre-trained target action detection model to obtain a fifth modal detection result of the video to be detected; inputting the digital audio file into a pre-trained target voice detection model to obtain a sixth modal detection result of the video to be detected; and aggregating the third modal detection result, the fourth modal detection result, the fifth modal detection result and the sixth modal detection result of the video to be detected to obtain a second modal detection result of the video to be detected.
In some optional implementation manners of this embodiment, aggregating the third modality detection result, the fourth modality detection result, the fifth modality detection result, and the sixth modality detection result of the video to be detected, and obtaining the second modality detection result of the video to be detected includes: and responding to the third mode detection result or the fourth mode detection result or the fifth mode detection result or the sixth mode detection result of the video to be detected as the target video, and determining that the second mode detection result of the video to be detected is the target video.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as detecting a target type video. For example, in some embodiments, the detection of the target type video may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the method of detecting target type video described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform detecting the target type video in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A method of detecting a target type video, comprising:
extracting at least one image frame from a video to be detected;
inputting the at least one image frame into a pre-trained target image classification model to obtain a classification result of the video to be detected;
in response to the fact that the classification result of the video to be detected is not the target type video, extracting a digital audio file from the video to be detected;
and inputting the digital audio file into a first target object classification model and/or inputting the at least one image frame into a second target object classification model to obtain a second modal detection result of the video to be detected, wherein the first target object classification model is used for determining whether the digital audio file contains a first target object, the second target object classification model is used for determining whether the at least one image frame contains a second target object, and the first target object and the second target object are used for representing a target type video.
2. The method of claim 1, wherein the second target object is a target text, and the inputting the at least one image frame into a second target object classification model to obtain a second modal detection result of the video to be detected comprises:
recognizing characters of the at least one image frame by adopting an optical character recognition technology;
matching the recognized characters with a preset target character dictionary;
and obtaining a second mode detection result of the video to be detected according to the matching result.
3. The method of claim 1, wherein the second target object is a target organ, and the inputting the at least one image frame into a second target object classification model to obtain a second modal detection result of the video to be detected comprises:
and inputting the at least one image frame into a pre-trained target organ detection model to obtain a second mode detection result of the video to be detected.
4. The method of claim 1, wherein the second target object is a target action, and the inputting the at least one image frame into a second target object classification model to obtain a second modal detection result of the video to be detected comprises:
and inputting the at least one image frame into a pre-trained target action detection model to obtain a second mode detection result of the video to be detected.
5. The method of claim 1, wherein the first target object is a target voice, and the inputting the digital audio file into the first target object classification model to obtain the second modal detection result of the video to be detected comprises:
and inputting the digital audio file into a pre-trained target voice detection model to obtain a second mode detection result of the video to be detected.
6. The method of claim 1, the second target object comprising a target text, a target organ, a target action, the first target object comprising a target voice, the inputting the digital audio file to a first target object classification model and the inputting the at least one image frame to a second target object classification model, the obtaining a second modal detection result of the video to be detected comprising:
inputting the at least one image frame into a target character classification model to obtain a third modal detection result of the video to be detected;
inputting the at least one image frame into a pre-trained target organ detection model to obtain a fourth modal detection result of the video to be detected;
inputting the at least one image frame into a pre-trained target action detection model to obtain a fifth modal detection result of the video to be detected;
inputting the digital audio file into a pre-trained target voice detection model to obtain a sixth modal detection result of the video to be detected;
and aggregating the third modal detection result, the fourth modal detection result, the fifth modal detection result and the sixth modal detection result of the video to be detected to obtain a second modal detection result of the video to be detected.
7. The method according to claim 6, wherein the aggregating the third modality detection result, the fourth modality detection result, the fifth modality detection result, and the sixth modality detection result of the video to be detected to obtain the second modality detection result of the video to be detected includes:
and responding to the third mode detection result or the fourth mode detection result or the fifth mode detection result or the sixth mode detection result of the video to be detected as the target video, and determining that the second mode detection result of the video to be detected is the target video.
8. An apparatus for detecting a target type video, the apparatus comprising:
a first extraction module configured to extract at least one image frame from a video to be detected;
the first classification module is configured to input the at least one image frame into a pre-trained target image classification model to obtain a classification result of the video to be detected;
a second extraction module configured to extract a digital audio file from the video to be detected in response to the classification result of the video to be detected not being a target type video;
a second classification module configured to input the digital audio file to a first target object classification model and/or input the at least one image frame to a second target object classification model, to obtain a second modal detection result of the video to be detected, wherein the first target object classification model is used for determining whether the digital audio file contains a first target object, the second target object classification model is used for determining whether the at least one image frame contains a second target object, and the first target object and the second target object are used for representing a target type video.
9. The apparatus of claim 8, wherein the second target object is a target word, the second classification module further configured to:
recognizing characters of the at least one image frame by adopting an optical character recognition technology;
matching the recognized characters with a preset target character dictionary;
and obtaining a second mode detection result of the video to be detected according to the matching result.
10. The apparatus of claim 8, wherein the second target object is a target organ, the second classification module further configured to:
and inputting the at least one image frame into a pre-trained target organ detection model to obtain a second mode detection result of the video to be detected.
11. The apparatus of claim 8, wherein the second target object is a target action, the second classification module further configured to:
and inputting the at least one image frame into a pre-trained target action detection model to obtain a second mode detection result of the video to be detected.
12. The apparatus of claim 8, wherein the first target object is target speech, the second classification module further configured to:
and inputting the digital audio file into a pre-trained target voice detection model to obtain a second mode detection result of the video to be detected.
13. The apparatus of claim 8, wherein the second target object comprises a target text, a target organ, a target action, the first target object comprises a target voice, the second classification module is further configured to:
inputting the at least one image frame into a target character classification model to obtain a third modal detection result of the video to be detected;
inputting the at least one image frame into a pre-trained target organ detection model to obtain a fourth modal detection result of the video to be detected;
inputting the at least one image frame into a pre-trained target action detection model to obtain a fifth modal detection result of the video to be detected;
inputting the digital audio file into a pre-trained target voice detection model to obtain a sixth modal detection result of the video to be detected;
and aggregating the third modal detection result, the fourth modal detection result, the fifth modal detection result and the sixth modal detection result of the video to be detected to obtain a second modal detection result of the video to be detected.
14. The apparatus according to claim 13, wherein the aggregating the third modality detection result, the fourth modality detection result, the fifth modality detection result, and the sixth modality detection result of the video to be detected to obtain the second modality detection result of the video to be detected includes:
and responding to the third mode detection result or the fourth mode detection result or the fifth mode detection result or the sixth mode detection result of the video to be detected as the target video, and determining that the second mode detection result of the video to be detected is the target video.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202110084414.7A 2021-01-21 2021-01-21 Method, device and equipment for detecting target type video and storage medium Pending CN112800919A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110084414.7A CN112800919A (en) 2021-01-21 2021-01-21 Method, device and equipment for detecting target type video and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110084414.7A CN112800919A (en) 2021-01-21 2021-01-21 Method, device and equipment for detecting target type video and storage medium

Publications (1)

Publication Number Publication Date
CN112800919A true CN112800919A (en) 2021-05-14

Family

ID=75811120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110084414.7A Pending CN112800919A (en) 2021-01-21 2021-01-21 Method, device and equipment for detecting target type video and storage medium

Country Status (1)

Country Link
CN (1) CN112800919A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657230A (en) * 2021-08-06 2021-11-16 北京百度网讯科技有限公司 Method for training news video recognition model, method for detecting video and device thereof
CN113673427A (en) * 2021-08-20 2021-11-19 北京达佳互联信息技术有限公司 Video identification determination method and device, electronic equipment and storage medium
CN113965803A (en) * 2021-09-08 2022-01-21 北京达佳互联信息技术有限公司 Video data processing method and device, electronic equipment and storage medium
CN115131825A (en) * 2022-07-14 2022-09-30 北京百度网讯科技有限公司 Human body attribute identification method and device, electronic equipment and storage medium
CN116524394A (en) * 2023-03-30 2023-08-01 北京百度网讯科技有限公司 Video detection method, device, equipment and storage medium
CN113657230B (en) * 2021-08-06 2024-04-23 北京百度网讯科技有限公司 Method for training news video recognition model, method for detecting video and device thereof

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657468A (en) * 2015-02-12 2015-05-27 中国科学院自动化研究所 Fast video classification method based on images and texts
CN106779073A (en) * 2016-12-27 2017-05-31 西安石油大学 Media information sorting technique and device based on deep neural network
WO2017113691A1 (en) * 2015-12-29 2017-07-06 乐视控股(北京)有限公司 Method and device for identifying video characteristics
CN107040795A (en) * 2017-04-27 2017-08-11 北京奇虎科技有限公司 The monitoring method and device of a kind of live video
CN107241617A (en) * 2016-03-29 2017-10-10 北京新媒传信科技有限公司 The recognition methods of video file and device
CN108124191A (en) * 2017-12-22 2018-06-05 北京百度网讯科技有限公司 A kind of video reviewing method, device and server
CN108921002A (en) * 2018-04-23 2018-11-30 中国科学院自动化研究所 Audio-video recognition methods and device are feared cruelly based on multi thread fusion
CN108985244A (en) * 2018-07-24 2018-12-11 海信集团有限公司 A kind of television program type recognition methods and device
CN109862391A (en) * 2019-03-18 2019-06-07 网易(杭州)网络有限公司 Video classification methods, medium, device and calculating equipment
CN110222649A (en) * 2019-06-10 2019-09-10 北京达佳互联信息技术有限公司 Video classification methods, device, electronic equipment and storage medium
CN110418161A (en) * 2019-08-02 2019-11-05 广州虎牙科技有限公司 Video reviewing method and device, electronic equipment and readable storage medium storing program for executing
CN110493615A (en) * 2018-05-15 2019-11-22 武汉斗鱼网络科技有限公司 A kind of live video monitoring method and electronic equipment
CN110798703A (en) * 2019-11-04 2020-02-14 云目未来科技(北京)有限公司 Method and device for detecting illegal video content and storage medium
CN110852231A (en) * 2019-11-04 2020-02-28 云目未来科技(北京)有限公司 Illegal video detection method and device and storage medium
CN110969066A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN111310026A (en) * 2020-01-17 2020-06-19 南京邮电大学 Artificial intelligence-based yellow-related terrorism monitoring method
CN112163566A (en) * 2020-10-28 2021-01-01 中国铁路兰州局集团有限公司 Video image monitoring method and device

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657468A (en) * 2015-02-12 2015-05-27 中国科学院自动化研究所 Fast video classification method based on images and texts
WO2017113691A1 (en) * 2015-12-29 2017-07-06 乐视控股(北京)有限公司 Method and device for identifying video characteristics
CN107241617A (en) * 2016-03-29 2017-10-10 北京新媒传信科技有限公司 The recognition methods of video file and device
CN106779073A (en) * 2016-12-27 2017-05-31 西安石油大学 Media information sorting technique and device based on deep neural network
CN107040795A (en) * 2017-04-27 2017-08-11 北京奇虎科技有限公司 The monitoring method and device of a kind of live video
CN108124191A (en) * 2017-12-22 2018-06-05 北京百度网讯科技有限公司 A kind of video reviewing method, device and server
CN108921002A (en) * 2018-04-23 2018-11-30 中国科学院自动化研究所 Audio-video recognition methods and device are feared cruelly based on multi thread fusion
CN110493615A (en) * 2018-05-15 2019-11-22 武汉斗鱼网络科技有限公司 A kind of live video monitoring method and electronic equipment
CN108985244A (en) * 2018-07-24 2018-12-11 海信集团有限公司 A kind of television program type recognition methods and device
CN110969066A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN109862391A (en) * 2019-03-18 2019-06-07 网易(杭州)网络有限公司 Video classification methods, medium, device and calculating equipment
CN110222649A (en) * 2019-06-10 2019-09-10 北京达佳互联信息技术有限公司 Video classification methods, device, electronic equipment and storage medium
CN110418161A (en) * 2019-08-02 2019-11-05 广州虎牙科技有限公司 Video reviewing method and device, electronic equipment and readable storage medium storing program for executing
CN110798703A (en) * 2019-11-04 2020-02-14 云目未来科技(北京)有限公司 Method and device for detecting illegal video content and storage medium
CN110852231A (en) * 2019-11-04 2020-02-28 云目未来科技(北京)有限公司 Illegal video detection method and device and storage medium
CN111310026A (en) * 2020-01-17 2020-06-19 南京邮电大学 Artificial intelligence-based yellow-related terrorism monitoring method
CN112163566A (en) * 2020-10-28 2021-01-01 中国铁路兰州局集团有限公司 Video image monitoring method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任栋;宋伟;于京;姜薇;: "特殊视频内容检测算法研究综述", 信息网络安全, no. 09, pages 184 - 191 *
毛安寅;张灵;陈思平;江少锋;: "基于肤色和行为的色情视频检测", 计算机工程与应用, no. 17, pages 169 - 173 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657230A (en) * 2021-08-06 2021-11-16 北京百度网讯科技有限公司 Method for training news video recognition model, method for detecting video and device thereof
CN113657230B (en) * 2021-08-06 2024-04-23 北京百度网讯科技有限公司 Method for training news video recognition model, method for detecting video and device thereof
CN113673427A (en) * 2021-08-20 2021-11-19 北京达佳互联信息技术有限公司 Video identification determination method and device, electronic equipment and storage medium
CN113673427B (en) * 2021-08-20 2024-03-22 北京达佳互联信息技术有限公司 Video identification method, device, electronic equipment and storage medium
CN113965803A (en) * 2021-09-08 2022-01-21 北京达佳互联信息技术有限公司 Video data processing method and device, electronic equipment and storage medium
CN113965803B (en) * 2021-09-08 2024-02-06 北京达佳互联信息技术有限公司 Video data processing method, device, electronic equipment and storage medium
CN115131825A (en) * 2022-07-14 2022-09-30 北京百度网讯科技有限公司 Human body attribute identification method and device, electronic equipment and storage medium
CN116524394A (en) * 2023-03-30 2023-08-01 北京百度网讯科技有限公司 Video detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113326764B (en) Method and device for training image recognition model and image recognition
US11062089B2 (en) Method and apparatus for generating information
CN108108743B (en) Abnormal user identification method and device for identifying abnormal user
CN112800919A (en) Method, device and equipment for detecting target type video and storage medium
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
CN107944032B (en) Method and apparatus for generating information
CN114118287A (en) Sample generation method, sample generation device, electronic device and storage medium
US20230096921A1 (en) Image recognition method and apparatus, electronic device and readable storage medium
CN115130581A (en) Sample generation method, training method, data processing method and electronic device
CN112650885A (en) Video classification method, device, equipment and medium
CN114898266A (en) Training method, image processing method, device, electronic device and storage medium
CN113239807B (en) Method and device for training bill identification model and bill identification
CN114494747A (en) Model training method, image processing method, device, electronic device and medium
CN113947701A (en) Training method, object recognition method, device, electronic device and storage medium
CN114037059A (en) Pre-training model, model generation method, data processing method and data processing device
CN113902899A (en) Training method, target detection method, device, electronic device and storage medium
CN114882334B (en) Method for generating pre-training model, model training method and device
CN114724144B (en) Text recognition method, training device, training equipment and training medium for model
CN116383382A (en) Sensitive information identification method and device, electronic equipment and storage medium
CN113792876B (en) Backbone network generation method, device, equipment and storage medium
CN113204665A (en) Image retrieval method, image retrieval device, electronic equipment and computer-readable storage medium
CN115131709B (en) Video category prediction method, training method and device for video category prediction model
CN113642495B (en) Training method, apparatus, and program product for evaluating model for time series nomination
US20220222941A1 (en) Method for recognizing action, electronic device and storage medium
CN113657248A (en) Training method and device for face recognition model and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination