CN110798703A

CN110798703A - Method and device for detecting illegal video content and storage medium

Info

Publication number: CN110798703A
Application number: CN201911067905.XA
Authority: CN
Inventors: 刘洋; 杨文鲜; 王新然; 李云飞; 傅景楠
Original assignee: Yunmu Future Technology Beijing Co Ltd
Current assignee: Yunmu Future Technology Beijing Co Ltd
Priority date: 2019-11-04
Filing date: 2019-11-04
Publication date: 2020-02-14

Abstract

The application discloses a method and a device for detecting video illegal content and a storage medium. The method for detecting the video illegal content comprises the following steps: acquiring a video to be detected; extracting video frames, video clips and audio in a video to be detected according to a preset video processing method; identifying violation content of an image of a video frame, and determining a first violation content detection result of a video to be detected; identifying illegal contents of the video clips, and determining a second illegal content detection result of the video to be detected; identifying illegal contents of the audio, and determining a third illegal content detection result of the video to be detected; and determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content over-detection result and the third illegal content over-detection result.

Description

Method and device for detecting illegal video content and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting content violations in a video, and a storage medium.

Background

With the development of computer network technology, more and more internet service providers providing uploading and video sharing services for users appear, so that the number of videos in the internet is increased in a blowout manner, higher requirements are provided for monitoring video contents, and the manual auditing mode is far from meeting the requirements. In recent years, an automatic video content monitoring solution appears, the existing video illegal content detection method is to respectively detect a part of video frames and audio in a video by extracting, and the video illegal content is regarded as an illegal video as long as the illegal video exists, so that not only is continuous information of the video not considered, but also comprehensive processing is not carried out on the result of illegal content over-detection, and the robustness is insufficient.

Aiming at the technical problems that a part of video frames and audio in a video are extracted by the video violation content detection method in the prior art and are respectively detected, the violation video is regarded as the violation video as long as the violation occurs, continuous information of the video is not considered, the result of violation content over-detection is not comprehensively processed, and the robustness is insufficient, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the disclosure provides a method, a device and a storage medium for detecting video illegal contents, which are used for at least solving the technical problems that in the prior art, a part of video frames and audio in a video are extracted by a video illegal content detection method for respective detection, and the video illegal contents are regarded as illegal videos as long as the video illegal contents exist, so that not only is continuous information of the video not considered, but also comprehensive processing is not carried out on illegal content over-detection results, and the robustness is insufficient.

According to an aspect of the embodiments of the present disclosure, there is provided a method for detecting video illegal content, including: acquiring a video to be detected; extracting video frames, video clips and audio in a video to be detected according to a preset video processing method; identifying illegal contents of images of video frames, and determining a first illegal content detection result of a video to be detected, wherein the first illegal content detection result is used for determining a target object in the video to be detected; identifying illegal contents of the video clip, and determining a second illegal content detection result of the video to be detected, wherein the second illegal content detection result is used for determining a behavior label of the video to be detected; identifying illegal contents of the audio, and determining a third illegal content detection result of the video to be detected, wherein the third illegal content detection result is used for determining a category label of the audio of the video to be detected; and determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content over-detection result and the third illegal content over-detection result, wherein the fourth illegal content detection result is used for determining a final illegal content identification result of the video to be detected.

According to another aspect of the embodiments of the present disclosure, there is also provided a storage medium including a stored program, wherein the method of any one of the above is performed by a processor when the program is executed.

According to another aspect of the embodiments of the present disclosure, there is also provided an apparatus for detecting video illegal content, including: the to-be-detected video acquisition module is used for acquiring a to-be-detected video; the extraction module is used for extracting video frames, video clips and audio in the video to be detected according to a preset video processing method; the first violation content detection result determining module is used for identifying violation content of the image of the video frame and determining a first violation content detection result of the video to be detected, wherein the first violation content detection result is used for determining a target object in the video to be detected; the second illegal content detection result determining module is used for identifying illegal content of the video clip and determining a second illegal content detection result of the video to be detected, wherein the second illegal content detection result is used for determining a behavior label of the video to be detected; the third illegal content detection result determining module is used for carrying out illegal content identification on the audio and determining a third illegal content detection result of the video to be detected, wherein the third illegal content detection result is used for determining a category label of the audio of the video to be detected; and a fourth illegal content detection result determining module, configured to determine a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content overdetection result, and the third illegal content overdetection result, where the fourth illegal content detection result is used to determine a final illegal content identification result of the video to be detected.

According to another aspect of the embodiments of the present disclosure, there is also provided an apparatus for detecting video illegal content, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a video to be detected; extracting video frames, video clips and audio in a video to be detected according to a preset video processing method; identifying illegal contents of images of video frames, and determining a first illegal content detection result of a video to be detected, wherein the first illegal content detection result is used for determining a target object in the video to be detected; identifying illegal contents of the video clip, and determining a second illegal content detection result of the video to be detected, wherein the second illegal content detection result is used for determining a behavior label of the video to be detected; identifying illegal contents of the audio, and determining a third illegal content detection result of the video to be detected, wherein the third illegal content detection result is used for determining a category label of the audio of the video to be detected; and determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content over-detection result and the third illegal content over-detection result, wherein the fourth illegal content detection result is used for determining a final illegal content identification result of the video to be detected.

Therefore, according to the technical scheme of the embodiment, the computing device determines a final detection result of the video to be detected, namely a fourth violation content detection result, by extracting the video frame, the video clip and the audio in the video to be detected, and extracting the first text information of the image of the video frame and the second text information of the image of the audio frame, and then respectively obtaining the first violation content detection result, the second violation content detection result, the third violation content detection result and the fifth violation content detection result of the video to be detected. By the method, the illegal content of the video to be detected is identified and analyzed from various angles such as images, video clips, faces, objects, voice, characters and the like of the video to be detected, and the illegal content of the video is detected comprehensively. Meanwhile, the fourth illegal content detection result covers the time domain positions of the video frames and the video clips in the video frames to be detected, so that the illegal content of the video to be detected can be more accurately positioned. The method further solves the technical problems that in the prior art, a part of video frames and audio in the video are extracted by a video violation content detection method for detection respectively, and the violation video is regarded as the violation video as long as the violation occurs, so that not only is continuous information of the video not considered, but also comprehensive processing is not carried out on the result of violation content over-detection, and the robustness is insufficient.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:

fig. 1 is a hardware block diagram of a computing device for implementing the method according to embodiment 1 of the present disclosure;

fig. 2 is a schematic flow chart of a method for detecting video illegal content according to embodiment 1 of the present disclosure;

fig. 3A is a schematic flow chart of a method for video violation content detection according to embodiment 1 of the present disclosure;

FIG. 3B is a schematic representation of the multi-modal fusion results according to example 1 of the present disclosure;

fig. 3C is a schematic diagram of temporal positions of the video frame and the video segment in the video to be detected according to embodiment 1 of the present disclosure;

fig. 4 is a schematic diagram of an apparatus for video violation content detection according to embodiment 2 of the present disclosure; and

fig. 5 is a schematic diagram of an apparatus for video violation content detection according to embodiment 3 of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to the present embodiment, there is also provided an embodiment of a method for video violation content detection, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The method embodiments provided by the present embodiment may be executed in a mobile terminal, a computer terminal, a server or a similar computing device. FIG. 1 illustrates a block diagram of a hardware architecture of a computing device for implementing video violation content detection. As shown in fig. 1, the computing device may include one or more processors (which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computing device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuitry may be a single, stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computing device. As referred to in the disclosed embodiments, the data processing circuit acts as a processor control (e.g., selection of a variable resistance termination path connected to the interface).

The memory may be configured to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the video violation content detection method in the embodiments of the disclosure, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, the method for detecting video violation content of the application software is implemented. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include memory located remotely from the processor, which may be connected to the computing device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device is used for receiving or transmitting data via a network. Specific examples of such networks may include wireless networks provided by communication providers of the computing devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computing device.

It should be noted here that in some alternative embodiments, the computing device shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that FIG. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in a computing device as described above.

In the above operating environment, according to a first aspect of the present embodiment, a method for video violation content detection is provided, which may be, for example, run in the computing device described above. Fig. 2 shows a flow diagram of the method, which, with reference to fig. 2, comprises:

s202: acquiring a video to be detected;

s204: extracting video frames, video clips and audio in a video to be detected according to a preset video processing method;

s206: identifying illegal contents of images of video frames, and determining a first illegal content detection result of a video to be detected, wherein the first illegal content detection result is used for determining a target object in the video to be detected;

s208: identifying illegal contents of the video clip, and determining a second illegal content detection result of the video to be detected, wherein the second illegal content detection result is used for determining a behavior label of the video to be detected;

s210: identifying illegal contents of the audio, and determining a third illegal content detection result of the video to be detected, wherein the third illegal content detection result is used for determining a category label of the audio of the video to be detected; and

s212: and determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content over-detection result and the third illegal content over-detection result, wherein the fourth illegal content detection result is used for determining a final illegal content identification result of the video to be detected.

As described in the background art, with the development of computer network technology, more and more internet service providers providing video uploading and sharing services for users appear, so that the number of videos in the internet increases in a blowout manner, which provides higher requirements for monitoring video content, and the manual review mode is far from meeting the requirements. In recent years, a plurality of semi-automatic video content monitoring solutions appear, the existing video illegal content detection method is to respectively detect a part of video frames and audio in a video by extracting, and the video illegal content is regarded as an illegal video as long as the illegal video exists, so that not only is continuous information of the video not considered, but also comprehensive processing is not carried out on the result of illegal content over-detection, and the robustness is insufficient.

In view of this, the technical solution of this embodiment provides a method for detecting video illegal content, specifically, as shown in fig. 2, after a computing device acquires a video to be detected (S202), the computing device extracts a video frame, a video clip, and an audio in the video to be detected according to a preset video processing method (S204), specifically, fig. 3A shows a further flowchart for exemplarily explaining the method for detecting video illegal content according to this embodiment, and as shown in fig. 3A, the computing device extracts a video frame from the video to be detected as an input for performing illegal content detection on an image in the video to be detected, extracts a video clip as an input for detecting video illegal content to be detected, and extracts an audio as an input for detecting illegal content in the video to be detected. Further, the end of the video processing method may include: the method comprises the following steps of equal-interval video frame extraction technology, key frame extraction technology, video slice segment technology, audio extraction technology and the like.

Further, violation content recognition is performed on the image of the video frame, and a first violation content detection result of the video to be detected is determined, where the first violation content detection result is used to determine a target object in the video to be detected (S206), and the computing device may perform violation content detection on the image of the video frame by using multiple deep learning models.

Further, violation content identification is performed on the video segment, and a second violation content detection result of the video to be detected is determined, where the second violation content detection result is used to determine a behavior tag of the video to be detected (S208), where the behavior tag may be a behavior tag such as "fighting" or the like.

Further, violation content identification is performed on the audio, and a third violation content detection result of the video to be detected is determined, where the third violation content detection result is used to determine a category label of the audio of the video to be detected (S210), where the category label of the audio may be a sound category label such as "explosive sound".

Further, fig. 3B shows a result schematic diagram of multi-modal fusion, and referring to fig. 3B, a fourth illegal content detection result of the video to be detected is determined according to the first illegal content detection result, the second illegal content over-detection result, and the third illegal content over-detection result, where the fourth illegal content detection result is used to determine a final illegal content recognition result of the video to be detected (S212), where the fourth illegal content detection result may be a "pornographic video" and a "riot video" or the like.

Therefore, according to the technical scheme of the embodiment, the computing device determines a final detection result of the video to be detected, namely a fourth violation content detection result, by extracting the video frame, the video clip and the audio in the video to be detected, and extracting the first text information of the image of the video frame and the second text information of the image of the audio frame, and then respectively obtaining the first violation content detection result, the second violation content detection result, the third violation content detection result and the fifth violation content detection result of the video to be detected. By the method, the illegal content of the video to be detected is identified and analyzed from various angles such as images, video clips, faces, objects, voice, characters and the like of the video to be detected, and the illegal content of the video is detected comprehensively. The method further solves the technical problems that in the prior art, a part of video frames and audio in the video are extracted by a video violation content detection method for detection respectively, and the violation video is regarded as the violation video as long as the violation occurs, so that not only is continuous information of the video not considered, but also comprehensive processing is not carried out on the result of violation content over-detection, and the robustness is insufficient.

Optionally, the operation of performing illegal content identification on the image of the video frame includes: classifying the images according to a preset image classification model, determining the category of the images, and indicating the type of the image violation by the category user of the images; detecting an object in the image according to a preset object detection model, and determining information of an illegal object in the image, wherein the information of the illegal object is used for indicating position information of the object in the image and type information of the illegal object; according to a preset text recognition model, recognizing first text information in the image and outputting the first text information of the image; identifying a face in the image according to a preset face identification model, and determining face information in the image, wherein the face information is used for indicating position information of the face and identity information of an object of the face; and determining a first violation content detection result according to the type of the image, the information of the violation object in the image and the information of the face in the image.

Specifically, referring to fig. 3A, the operation of the computing device performing violation content identification on the image of the video frame may, for example, classify the image according to a preset image classification model, determine a category of the image, where a category user of the image indicates a type of an image violation, where the category of the image may be a violation type such as "pornography"; the computing device detects an object in the image according to a preset object detection model, and determines information of an illegal object in the image, wherein the information of the illegal object is used for indicating position information of the object in the image and type information of the illegal object, and the position of the object can be marked by using a rectangular frame method for example; the computing equipment identifies first text information in the image according to a preset text identification model, and outputs the first text information of the image, wherein the first text information can be characters in the image; the computing equipment identifies the face in the image according to a preset face identification model, and determines the information of the face in the image, wherein the information of the face is used for indicating the position information of the face and the identity information of an object of the face, and the position of the face can also be marked by a rectangular frame. The computing device then determines a first violation content detection result according to the type of the image, the information of the violation object in the image, and the information of the face in the image, so that the computing device can detect the violation type, the violation object, the violation person, and the like existing in the image of the video frame.

Optionally, the operation of performing illegal content identification on the video segment includes: determining a behavior tag of the video segment, wherein the behavior tag is used for indicating the violation type of the video segment; and determining a second violation content detection result according to the behavior tag.

Specifically, referring to fig. 3A, the operation of the computing device to perform illegal content identification on the video segment includes: and determining a behavior tag of the video segment, wherein the behavior tag is used for indicating the violation type of the video segment, and the behavior tag can be a tag of a behavior class such as "fighting". The computing device then determines a second violation content detection result based on the behavior tag, thereby determining that the behavior tag appearing in the video clip is "fighting" when the behavior tag is "fighting".

Optionally, the operation of performing illegal content identification on the audio includes: classifying the audio according to a preset sound classification model, and outputting a class label of the audio, wherein the class label is used for indicating the violation class of the sound of the audio; determining a third violation content detection result according to the class label of the audio; and recognizing the voice in the audio according to a preset voice recognition model, and outputting second text information in the audio.

Specifically, referring to fig. 3A, the identification of the content of violation of the audio by the computing device may, for example, classify the audio according to a preset sound classification model, and output a category label of the audio, where the category label is used to indicate a violation category of the sound of the audio, and the category label may be, for example, "explosive sound"; the computing device determines a third violation content detection result according to the category label of the audio, that is, determines a category label of the audio of the video to be detected, which may be "explosion sound" here, for example; the computing equipment identifies the voice in the audio according to a preset voice identification model and outputs second text information in the audio, so that the computing equipment can detect the class label in the audio through the audio in the video to be detected and convert the sound in the audio into characters.

Optionally, determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content detection result, and the third illegal content detection result, where the determining includes: detecting violation contents of the first text information and the second text information according to a preset text information classification model, and determining violation labels of the first text information and the second text information, wherein the violation labels are used for indicating the type of violation of the first text information and the second text information; determining a fifth illegal content detection result of the video to be detected according to the illegal labels of the first text information and the second text information, wherein the fifth illegal content detection result is used for determining the illegal label of the character information of the video to be detected; and determining a fourth illegal content detection result according to the first illegal content detection result, the second illegal content over-detection result, the third illegal content over-detection result and the fifth illegal content detection result.

Specifically, referring to fig. 3A, the computing device determines a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content detection result, and the third illegal content detection result, for example, may perform illegal content detection on the first text information and the second text information according to a preset text information classification model, that is, determine an illegal tag of the first text information and the second text information for the characters recognized in the image and the characters recognized in the audio, where the illegal tag is used to indicate the type of the first text information and the second text information violation; and the computing equipment determines a fifth illegal content detection result of the video to be detected according to the illegal labels of the first text information and the second text information, wherein the fifth illegal content detection result is used for determining the illegal label of the character information of the video to be detected, and then the computing equipment determines a fourth illegal content detection result according to the first illegal content detection result, the second illegal content over-detection result, the third illegal content over-detection result and the fifth illegal content detection result. Therefore, the computing equipment can perform illegal content analysis on the video to be detected from multiple angles by fusing the image illegal content, the audio illegal content, the video clip illegal content and the character illegal content, and further obtain a final illegal content detection result of the video to be detected.

Optionally, the position of the fourth illegal content detection result in the video to be detected is determined according to the time sequence information of the video to be detected and the time domain positions of the video segments and the video frames in the video to be detected.

Specifically, fig. 3B shows a schematic diagram of a multi-modal fusion result and fig. 3C shows a schematic diagram of a time domain position of a video frame and a video clip in a video to be detected, and referring to fig. 3B and fig. 3C, the computing device determines a position of a fourth illegal content detection result in the video to be detected according to the time sequence information of the video to be detected and the time domain positions of the video clip and the video frame in the video to be detected. The computing equipment adopts a multi-mode fusion model to fuse the image violation content, the audio violation content, the video clip violation content and the character violation content, and combines the time sequence information of the video to be detected, namely the positions of the video frames and the video clips in the video to be detected, so that the final violation content identification result of the video to be detected is determined, and the time domain position of the violation content in the video to be detected can also be determined.

Further, referring to fig. 1, according to a second aspect of the present embodiment, there is provided a storage medium. The storage medium comprises a stored program, wherein the method of any of the above is performed by a processor when the program is run.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

Fig. 4 shows an apparatus 400 for video violation content detection according to the present embodiment, where the apparatus 400 corresponds to the method according to embodiment 1. Referring to fig. 4, the apparatus 400 includes: a to-be-detected video acquisition module 410, configured to acquire a to-be-detected video; the extracting module 420 is configured to extract a video frame, a video clip, and an audio in a video to be detected according to a preset video processing method; the first illegal content detection result determining module 430 is configured to perform illegal content identification on an image of a video frame and determine a first illegal content detection result of the video to be detected, where the first illegal content detection result is used to determine a target object in the video to be detected; the second illegal content detection result determining module 440 is configured to perform illegal content identification on the video segment and determine a second illegal content detection result of the video to be detected, where the second illegal content detection result is used to determine a behavior tag of the video to be detected; a third illegal content detection result determining module 450, configured to perform illegal content identification on the audio and determine a third illegal content detection result of the video to be detected, where the third illegal content detection result is used to determine a category label of the audio of the video to be detected; and a fourth illegal content detection result determining module 460, configured to determine a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content overdetection result, and the third illegal content overdetection result, where the fourth illegal content detection result is used to determine a final illegal content identification result of the video to be detected.

Optionally, the first violation content detection result determining module 430 includes: the image category determining submodule is used for classifying the images according to a preset image classification model, determining the categories of the images, and indicating the types of image violation by category users of the images; the illegal object determining submodule is used for detecting an object in the image according to a preset object detection model and determining information of the illegal object in the image, wherein the information of the illegal object is used for indicating position information of the object in the image and type information of the illegal object; the first text information determining submodule is used for identifying first text information in the image according to a preset text identification model and outputting the first text information of the image; the face information determining submodule is used for identifying the face in the image according to a preset face identification model and determining the information of the face in the image, wherein the information of the face is used for indicating the position information of the face and the identity information of an object of the face; and the first violation content detection result determining submodule is used for determining a first violation content detection result according to the type of the image, the information of the violation object in the image and the information of the face in the image.

Optionally, the second violation content detection result determining module 440 includes: a behavior tag determining submodule for determining a behavior tag of the video segment, wherein the behavior tag is used for indicating the violation type of the video segment; and the second violation content detection result determining submodule is used for determining a second violation content detection result according to the behavior tag.

Optionally, the third violation content detection result determining module 450 includes: the class label determining submodule is used for classifying the audio according to a preset sound classification model and outputting a class label of the audio, wherein the class label is used for indicating the violation class of the sound of the audio; the third illegal content detection result determining submodule is used for determining a third illegal content detection result according to the class label of the audio; and the second text information determining submodule is used for recognizing the voice in the audio according to a preset voice recognition model and outputting the second text information in the audio.

Optionally, the fourth violation content detection determination module 460 includes: the operation of identifying the illegal content of the image of the video frame comprises the following operations: classifying the images according to a preset image classification model, determining the category of the images, and indicating the type of the image violation by the category user of the images; detecting an object in the image according to a preset object detection model, and determining information of an illegal object in the image, wherein the information of the illegal object is used for indicating position information of the object in the image and type information of the illegal object; according to a preset text recognition model, recognizing first text information in the image and outputting the first text information of the image; identifying a face in the image according to a preset face identification model, and determining face information in the image, wherein the face information is used for indicating position information of the face and identity information of an object of the face; and determining a first violation content detection result according to the type of the image, the information of the violation object in the image and the information of the face in the image, wherein the operation of carrying out violation content identification on the audio comprises the following steps: classifying the audio according to a preset sound classification model, and outputting a class label of the audio, wherein the class label is used for indicating the violation class of the sound of the audio; determining a third violation content detection result according to the class label of the audio; the violation label determining submodule is used for detecting violation contents of the first text information and the second text information according to the preset text information classification model and determining violation labels of the first text information and the second text information, wherein the violation labels are used for indicating the type of violation of the first text information and the second text information; a fifth violation content detection result determining sub-module, configured to determine a fifth violation content detection result of the video to be detected according to the violation labels of the first text information and the second text information, where the fifth violation content detection result is used to determine the violation label of the text information of the video to be detected; and a fourth violation content detection result determination submodule, configured to determine a fourth violation content detection result according to the first violation content detection result, the second violation content overdetection result, the third violation content overdetection result, and the fifth violation content detection result.

Optionally, the apparatus 400 further comprises: and the position determining submodule is used for determining the position of the fourth illegal content detection result in the video to be detected according to the time sequence information of the video to be detected and the time domain positions of the video segments and the video frames in the video to be detected.

Example 3

Fig. 5 shows an apparatus 500 for video violation content detection according to the present embodiment, where the apparatus 500 corresponds to the method according to embodiment 1. Referring to fig. 5, the apparatus 500 includes: a processor 510; and a memory 520 coupled to processor 510 for providing processor 510 with instructions to process the following process steps: acquiring a video to be detected; extracting video frames, video clips and audio in a video to be detected according to a preset video processing method; identifying illegal contents of images of video frames, and determining a first illegal content detection result of a video to be detected, wherein the first illegal content detection result is used for determining a target object in the video to be detected; identifying illegal contents of the video clip, and determining a second illegal content detection result of the video to be detected, wherein the second illegal content detection result is used for determining a behavior label of the video to be detected; identifying illegal contents of the audio, and determining a third illegal content detection result of the video to be detected, wherein the third illegal content detection result is used for determining a category label of the audio of the video to be detected; and determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content over-detection result and the third illegal content over-detection result, wherein the fourth illegal content detection result is used for determining a final illegal content identification result of the video to be detected. .

Optionally, determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content detection result, and the third illegal content detection result, where the determining includes: the operation of identifying the illegal content of the image of the video frame comprises the following operations: classifying the images according to a preset image classification model, determining the category of the images, and indicating the type of the image violation by the category user of the images; detecting an object in the image according to a preset object detection model, and determining information of an illegal object in the image, wherein the information of the illegal object is used for indicating position information of the object in the image and type information of the illegal object; according to a preset text recognition model, recognizing first text information in the image and outputting the first text information of the image; identifying a face in the image according to a preset face identification model, and determining face information in the image, wherein the face information is used for indicating position information of the face and identity information of an object of the face; and determining a first violation content detection result according to the type of the image, the information of the violation object in the image and the information of the face in the image, wherein the operation of carrying out violation content identification on the audio comprises the following steps: classifying the audio according to a preset sound classification model, and outputting a class label of the audio, wherein the class label is used for indicating the violation class of the sound of the audio; determining a third violation content detection result according to the class label of the audio; recognizing the voice in the audio according to a preset voice recognition model, outputting second text information in the audio, detecting violation contents of the first text information and the second text information according to a preset text information classification model, and determining violation labels of the first text information and the second text information, wherein the violation labels are used for indicating the types of violation of the first text information and the second text information; determining a fifth illegal content detection result of the video to be detected according to the illegal labels of the first text information and the second text information, wherein the fifth illegal content detection result is used for determining the illegal label of the character information of the video to be detected; and determining a fourth illegal content detection result according to the first illegal content detection result, the second illegal content over-detection result, the third illegal content over-detection result and the fifth illegal content detection result.

Optionally, the apparatus 500 further comprises: and determining the position of the fourth illegal content detection result in the video to be detected according to the time sequence information of the video to be detected and the time domain positions of the video segments and the video frames in the video to be detected.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, which can store program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for video violation content detection, comprising:

acquiring a video to be detected;

extracting video frames, video clips and audio in the video to be detected according to a preset video processing method;

identifying illegal contents of the image of the video frame, and determining a first illegal content detection result of the video to be detected, wherein the first illegal content detection result is used for determining a target object in the video to be detected;

identifying illegal contents of the video clip, and determining a second illegal content detection result of the video to be detected, wherein the second illegal content detection result is used for determining a behavior label of the video to be detected;

identifying illegal contents of the audio, and determining a third illegal content detection result of the video to be detected, wherein the third illegal content detection result is used for determining a category label of the audio of the video to be detected; and

and determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content over-detection result and the third illegal content over-detection result, wherein the fourth illegal content detection result is used for determining a final illegal content identification result of the video to be detected.

2. The method of claim 1, wherein identifying the illegal content of the image of the video frame comprises:

classifying the images according to a preset image classification model, and determining the category of the images, wherein the category user of the images indicates the type of the image violation;

detecting an object in the image according to a preset object detection model, and determining information of an illegal object in the image, wherein the information of the illegal object is used for indicating position information and type information of the illegal object in the image;

according to a preset text recognition model, recognizing first text information in the image and outputting the first text information of the image;

according to a preset face recognition model, recognizing a face in the image, and determining face information in the image, wherein the face information is used for indicating position information of the face and identity information of an object of the face; and

and determining the first violation content detection result according to the category of the image, the information of the violation object in the image and the information of the face in the image.

3. The method of claim 1, wherein the act of performing illegal content identification on the video segment comprises:

determining a behavior tag of the video segment, wherein the behavior tag is used for indicating the violation type of the video segment; and

and determining the second violation content detection result according to the behavior tag.

4. The method of claim 1, wherein identifying the audio for offending content comprises:

classifying the audio according to a preset sound classification model, and outputting a class label of the audio, wherein the class label is used for indicating an illegal class of the sound of the audio;

determining the third violation content detection result according to the category label of the audio; and

and recognizing the voice in the audio according to a preset voice recognition model, and outputting second text information in the audio.

5. The method according to claim 1, wherein determining a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content detection result, and the third illegal content detection result comprises:

the operation of identifying the illegal content of the image of the video frame comprises the following operations: classifying the images according to a preset image classification model, and determining the category of the images, wherein the category user of the images indicates the type of the image violation; detecting an object in the image according to a preset object detection model, and determining information of an illegal object in the image, wherein the information of the illegal object is used for indicating position information and type information of the illegal object in the image; according to a preset text recognition model, recognizing first text information in the image and outputting the first text information of the image; according to a preset face recognition model, recognizing a face in the image, and determining face information in the image, wherein the face information is used for indicating position information of the face and identity information of an object of the face; and determining the first violation content detection result according to the category of the image, the information of the violation object in the image and the information of the face in the image,

the operation of violation content identification on the audio comprises the following operations: classifying the audio according to a preset sound classification model, and outputting a class label of the audio, wherein the class label is used for indicating an illegal class of the sound of the audio; determining the third violation content detection result according to the category label of the audio; and recognizing the voice in the audio according to a preset voice recognition model, outputting second text information in the audio, and

detecting violation contents of the first text message and the second text message according to a preset text message classification model, and determining violation labels of the first text message and the second text message, wherein the violation labels are used for indicating the type of violation of the first text message and the second text message;

determining a fifth illegal content detection result of the video to be detected according to the illegal labels of the first text information and the second text information, wherein the fifth illegal content detection result is used for determining the illegal label of the character information of the video to be detected; and

and determining the fourth illegal content detection result according to the first illegal content detection result, the second illegal content over-detection result, the third illegal content over-detection result and the fifth illegal content detection result.

6. The method of claim 1, further comprising:

and determining the position of the fourth illegal content detection result in the video to be detected according to the time sequence information of the video to be detected and the time domain positions of the video segments and the video frames in the video to be detected.

7. A storage medium comprising a stored program, wherein the method of any one of claims 1 to 7 is performed by a processor when the program is run.

8. An apparatus for video violation content detection, comprising:

the to-be-detected video acquisition module is used for acquiring a to-be-detected video;

the extraction module is used for extracting video frames, video clips and audio in the video to be detected according to a preset video processing method;

the first illegal content detection result determining module is used for identifying illegal content of the image of the video frame and determining a first illegal content detection result of the video to be detected, wherein the first illegal content detection result is used for determining a target object in the video to be detected;

a second illegal content detection result determining module, configured to perform illegal content identification on the video segment and determine a second illegal content detection result of the video to be detected, where the second illegal content detection result is used to determine a behavior tag of the video to be detected;

a third illegal content detection result determining module, configured to perform illegal content identification on the audio and determine a third illegal content detection result of the video to be detected, where the third illegal content detection result is used to determine a category label of the audio of the video to be detected; and

and a fourth illegal content detection result determining module, configured to determine a fourth illegal content detection result of the video to be detected according to the first illegal content detection result, the second illegal content overdetection result, and the third illegal content overdetection result, where the fourth illegal content detection result is used to determine a final illegal content identification result of the video to be detected.

9. The apparatus of claim 8, wherein the first violation content detection determination module comprises:

the image category determining submodule is used for classifying the images according to a preset image classification model and determining the categories of the images, and the category users of the images indicate the types of the image violations;

the illegal object determining submodule is used for detecting an object in the image according to a preset object detection model and determining information of the illegal object in the image, wherein the information of the illegal object is used for indicating position information of the object in the image and type information of the illegal object;

the first text information determining submodule is used for identifying first text information in the image according to a preset text identification model and determining the first text information of the image;

the face information determining submodule is used for identifying the face in the image according to a preset face identification model and determining the information of the face in the image, wherein the information of the face is used for indicating the position information of the face and the identity information of an object of the face; and

and the first violation content detection result determining submodule is used for determining the first violation content detection result according to the category of the image, the information of the violation object in the image and the information of the face in the image.

10. An apparatus for video violation content detection, comprising:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:

acquiring a video to be detected;