CN110852231A

CN110852231A - Illegal video detection method and device and storage medium

Info

Publication number: CN110852231A
Application number: CN201911067231.3A
Authority: CN
Inventors: 刘洋; 杨文鲜; 王新然; 李云飞; 傅景楠
Original assignee: Yunmu Future Technology Beijing Co Ltd
Current assignee: Yunmu Future Technology Beijing Co Ltd
Priority date: 2019-11-04
Filing date: 2019-11-04
Publication date: 2020-02-28

Abstract

The application discloses a violation video detection method and device and a storage medium. The illegal video detection method comprises the following steps: acquiring a video to be detected; and executing the following detection operations on the video to be detected to determine whether the video to be detected is an illegal video: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected.

Description

Illegal video detection method and device and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting an illegal video, and a storage medium.

Background

With the development of computer network technology, more and more internet service providers providing uploading and video sharing services for users appear, so that the number of videos in the internet is increased in a blowout manner, higher requirements are provided for monitoring video contents, and the manual auditing mode is far from meeting the requirements. Many semi-automated video content monitoring solutions have emerged in recent years. However, the existing semi-automatic video content monitoring method has the following problems:

(1) in the process of monitoring the video content, only the image content in the video is extracted for violation judgment, and the result is not accurate enough.

(2) In the process of monitoring video content, violation judgment is carried out on all video frames of a video, which puts high requirements on the computing capacity of a machine and is difficult to popularize.

The violation judgment is carried out by only extracting the image content in the video in the prior art, and the result is not accurate enough; the technical problems that all video frames of a video are judged in violation, the requirement on the computing power of a computer is too high, and popularization is difficult are solved.

Disclosure of Invention

The embodiment of the disclosure provides a method, a device and a storage medium for detecting an illegal video, which are used for at least solving the problem that in the prior art, the result is not accurate enough when only image content in a video is extracted for carrying out illegal judgment; the violation judgment is carried out on all video frames of the video, the requirement on the computing power of a computer is too high, and the popularization is difficult.

According to an aspect of the embodiments of the present disclosure, there is provided a method for detecting an illegal video, including: acquiring a video to be detected; and executing the following detection operations on the video to be detected to determine whether the video to be detected is an illegal video: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected.

According to another aspect of the embodiments of the present disclosure, there is also provided a storage medium including a stored program, wherein the method of any one of the above is performed by a processor when the program is executed.

According to another aspect of the embodiments of the present disclosure, there is also provided an illegal video detection device, including: the to-be-detected video acquisition module is used for acquiring a to-be-detected video; and the video violation detection module to be detected is used for executing the following detection operations on the video to be detected and determining whether the video to be detected is a violation video: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected.

According to another aspect of the embodiments of the present disclosure, there is also provided an illegal video detection device, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a video to be detected; and executing the following detection operations on the video to be detected to determine whether the video to be detected is an illegal video: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected.

Therefore, according to the embodiment, whether the video to be detected is an illegal video is determined by performing the following operations on the video to be detected: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected. Therefore, by detecting the key frame in the video to be detected instead of detecting all images in the video to be detected, a large amount of computing resources are saved for the system, and the overall performance is improved. In addition, according to the technical scheme of the embodiment, the image content in the video to be detected is detected, and the text information extracted from the video to be detected is also detected, so that the accuracy of the detection result is improved. Therefore, the problem that in the prior art, the result is not accurate enough when only the image content in the video is extracted for violation judgment is solved; the violation judgment is carried out on all video frames of the video, the requirement on the computing power of a computer is too high, and the popularization is difficult.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:

fig. 1 is a hardware block diagram of a computing device for implementing the method according to embodiment 1 of the present disclosure;

fig. 2 is a schematic flowchart of an illegal video detection method according to embodiment 1 of the present disclosure;

fig. 3 is a schematic flowchart of image violation content detection according to embodiment 1 of the present disclosure;

fig. 4 is a schematic flow chart of an illegal content detection model according to embodiment 1 of the present disclosure;

fig. 5 is a schematic flowchart of text information violation content detection according to embodiment 1 of the present disclosure;

fig. 6 is a schematic flow chart illustrating a process of determining a violation content detection result according to embodiment 1 of the present disclosure.

Fig. 7 is a schematic diagram of an illegal video detection device according to embodiment 2 of the present disclosure; and

fig. 8 is a schematic diagram of an illegal video detection device according to embodiment 3 of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to the present embodiment, there is also provided an illegal video detection method embodiment, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The method provided by the embodiment can be executed in a mobile terminal, a computer terminal or a similar operation device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing the violation video detection method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the disclosed embodiments, the data processing circuit acts as a processor control (e.g., selection of a variable resistance termination path connected to the interface).

The memory 104 may be configured to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the illegal video detection method in the embodiment of the present disclosure, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the illegal video detection method of the application software. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

In the foregoing operating environment, according to a first aspect of this embodiment, there is provided a violation video detection method, including: acquiring a video to be detected; and executing the following detection operations on the video to be detected to determine whether the video to be detected is an illegal video: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected. Fig. 2 shows a flow diagram of the method, which, with reference to fig. 2, comprises:

s202: acquiring a video to be detected; and

s204: executing the following detection operations on the video to be detected, and determining whether the video to be detected is an illegal video: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected.

As described in the background art, with the development of computer network technology, more and more internet service providers providing video uploading and sharing services for users appear, so that the number of videos in the internet increases in a blowout manner, which provides higher requirements for monitoring video content, and the manual review mode is far from meeting the requirements. Many semi-automated video content monitoring solutions have emerged in recent years. In the process of monitoring video content, only extracting image content in a video for violation judgment, and the result is not accurate enough; in the process of monitoring video content, violation judgment is carried out on all video frames of a video, which puts high requirements on the computing capacity of a machine and is difficult to popularize.

For the technical problem in the background art, in the detection method for the illegal video provided in the technical solution of the present embodiment, specifically, as shown in the method flowchart of fig. 2, after a video to be detected is acquired (S202), the following operations are performed on the video to be detected (S204): removing background frames without meaningful contents in the video to be detected, extracting key frames, and detecting illegal contents of images in the key frames of the video to be detected; and/or extracting audio information, title information and description information in the video to be detected and/or text information in key frames of the video to be detected, and detecting illegal contents of the text information of the video to be detected.

Therefore, according to the technical scheme of the embodiment, the violation detection result of the video to be detected can be obtained by combining the detection result of the violation content of the image in the key frame of the video to be detected and the detection result of the violation content of the text information of the video to be detected. According to the method, the key frames in the video to be detected are detected instead of all images in the video to be detected, so that a large amount of computing resources are saved for the system, and the overall performance is improved. In addition, the method not only detects the image content in the video to be detected, but also detects the text information extracted from the video to be detected, thereby improving the accuracy of the detection result.

Optionally, the operation of detecting the illegal content of the image in the key frame of the video to be detected includes: detecting whether a key frame of a video comprises an image of an illegal person or not by using a preset face recognition model; detecting whether an image in a key frame of a video is first-type violation content and a corresponding violation category by using a preset violation content classification model; and detecting whether the image in the key frame of the video contains the second type violation content, the position of the second type violation content and the corresponding violation category by using a preset violation content detection model.

Specifically, fig. 3 shows a schematic diagram of an operation flow for detecting illegal contents of an image in a key frame of a video to be detected, after a video key frame of the video to be detected is extracted, the illegal contents are detected by using a preset face recognition model, an illegal content classification model and an illegal content detection model, and the specific contents of the three models are as follows:

the face recognition model firstly recognizes the face appearing in the input video key frame, then compares the face with the face of a specific sensitive figure, and finally judges whether the detected face is an illegal figure.

The illegal content classification model firstly classifies images in a video key frame to be detected, and judges whether the images in the video key frame to be detected are first type illegal contents and corresponding illegal categories, wherein the first type illegal contents mainly refer to scenes such as illegal tourism, violent action and the like.

The illegal content detection model firstly extracts a suspected illegal content position area from an image in a video key frame to be detected, and then judges whether the image in the area is second type illegal content, the position of the second type illegal content and a corresponding illegal category, wherein the second type illegal content mainly refers to specific articles such as management tools, guns and ammunition and the like.

Therefore, the technical scheme of the embodiment can detect the video to be detected in parallel according to the multiple categories of the illegal contents, so that the detection accuracy is improved.

Optionally, the operation of detecting whether an image in a key frame of the video includes the second type of violation content, the position of the second type of violation content, and a corresponding violation category by using a preset violation content detection model includes: detecting an area in which an image in a key frame of the video may include second-type illegal content by using an illegal content sub-detection model of the illegal content detection model; and determining whether the image within the region includes a second type of illegal content using an illegal content subcategory model of the illegal content detection model.

Specifically, fig. 4 is a schematic diagram illustrating an operation flow of detecting whether an image in a key frame of a video contains second-type violation content, a location of the second-type violation content, and a corresponding violation category using a preset violation content detection model. Referring to fig. 4, after an image of a video key frame of a video to be detected is extracted, a second type of illegal content appearing in a picture is detected by using a preset illegal content sub-detection model, a position area of the content in the picture is detected, and a corresponding illegal category is given. But the prediction given by the illegal content sub-detection model usually has a larger false detection rate (the illegal content is predicted as the illegal content).

Therefore, after the illegal content sub-detection model is used, the detected position area of the content in the picture needs to be further identified by the illegal content sub-classification model. Specifically, a subgraph is deducted from a picture in a corresponding video key frame to be detected according to the position of the prediction region of the illegal content sub-detection model to serve as the input of the illegal content sub-classification model, and whether the image in the region comprises the second type of illegal content is further determined.

Therefore, by the mode, the false detection rate of the video to be detected is reduced, and the detection accuracy is improved.

Optionally, the following detection operations are performed on the video to be detected to determine whether the video to be detected is an illegal video, and the method further includes determining a probability that the video to be detected meets a regulation according to the following detection results: detecting whether a key frame of a video comprises an image of an illegal person; detecting whether an image in a key frame of a video is first type violation content and a corresponding violation category; and detecting whether the images in the key frames of the video contain the second type violation content, the position of the second type violation content, and the corresponding violation category.

Specifically, when the violation content is detected by using a preset face recognition model, a violation content classification model and a violation content detection model, the three models not only give predictions of whether images in a key frame are violation or not, the position of the violation content and the category of the violation content, but also give a probability that the predictions are correct, and according to the probability value, a prediction result is divided into two categories, namely high-confidence prediction and general prediction. If a video is given a violation prediction with high confidence, then this video has a high probability of being a violation video, and the early warning can be generated directly. Conversely, if a video is given a prediction with low confidence, then the video is not an offending video and can be classified into the correct archive after manual review.

Optionally, the operation of detecting the illegal content of the audio information in the video to be detected includes: generating corresponding first text information according to the audio frequency in the video to be detected by using a preset voice recognition model; and detecting the illegal content of the first text information by using a preset text classification model and a preset keyword matching model.

Specifically, fig. 5 is a flowchart illustrating an operation of detecting illegal content of audio information in a video to be detected, where the audio information in the video to be detected is converted into first text information through a speech recognition model, and then the illegal content is detected through a text classification model and a keyword matching model. The speech recognition Model can be, but not limited to, Gaussian Mixture Model (GMM) and Hidden Markov Model (GMM-HMM Model). The text classification model is used for judging whether the input text is an illegal type and an illegal type by passing the text information through a text classification model constructed based on Bert. The keyword matching model refers to matching the text information with an established violation keyword library, and predicting whether the text information is violation text information and violation categories according to a matching result.

Optionally, the operation of detecting illegal contents of the title information and the description information in the video to be detected includes: acquiring second text information corresponding to the title information and the description information; and detecting the illegal content of the second text information by using a preset text classification model and a preset keyword matching model.

Specifically, fig. 5 is a flowchart illustrating an operation of detecting illegal content of audio information in a video to be detected, where a video text acquisition module is used to acquire title information and description information of the video to be detected, the title information and the description information are used as second-class text information, and then a text classification model and a keyword matching model are used to detect illegal content of the video to be detected, where the video text acquisition module is used to identify and extract the title information and the description information of the video to be detected, and the text classification model and the keyword matching model specifically define a previous segment.

Optionally, the operation of detecting the content of violation of the text information extracted from the video key frame to be detected includes: extracting third text information from the key frame of the video by using a preset OCR recognition model; and detecting the illegal content of the third text information by using a preset text classification model and a preset keyword matching model.

Specifically, fig. 5 is a flowchart illustrating an operation of detecting illegal content of audio information in a video to be detected, which extracts an image of a key frame of the video to be detected, extracts a text region in the image by using an OCR recognition model, and recognizes text content corresponding to the text region. The detection method of the region where the text in the OCR recognition model is located can adopt a method based on a connected domain, a method based on a sliding window and a method based on deep learning, and the OCR recognition model based on the deep learning method can adopt a text detection method based on Rotation-RPN, a text detection method based on a linked text suggestion network and a text detection method based on a full convolution network; the text classification model and the keyword matching model specifically define the previous paragraph.

Optionally, the operation of determining whether the video to be detected is an illegal video includes: determining a first weight of a first detection result for detecting violation content of an image in a key frame of a video to be detected; determining a second weight of a second detection result for detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected; and determining whether the video to be detected is the violation video or not according to the first detection result, the second detection result, the first weight and the second weight.

Specifically, referring to the flowchart shown in fig. 6, values of weights, i.e., a first weight and a second weight, of a first detection result of detecting the illegal content of the image in the key frame of the video to be detected and a second detection result of detecting the illegal content of the text information in the video to be detected are determined according to actual requirements, and then whether the video to be detected contains the illegal content is determined according to the first detection result, the second detection result and the weights thereof through the multi-modal predictive fusion module.

Therefore, the technical scheme of the embodiment can coordinate the influence of each detection result by using the weight value, so that the video to be detected can be detected more accurately.

Further, referring to fig. 1, according to a second aspect of the present embodiment, there is provided a storage medium. The storage medium comprises a stored program, wherein the method of any of the above is performed by a processor when the program is run.

Therefore, according to the embodiment, whether the video to be detected is an illegal video is determined by performing the following operations on the video to be detected: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected. Therefore, by detecting the key frame in the video to be detected instead of detecting all images in the video to be detected, a large amount of computing resources are saved for the system, and the overall performance is improved. In addition, according to the technical scheme of the embodiment, the image content in the video to be detected is detected, and the text information extracted from the video to be detected is also detected, so that the accuracy of the detection result is improved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

Fig. 7 shows an illegal video detection device 700 according to the present embodiment, which device 700 corresponds to the method according to embodiment 1. Referring to fig. 7, the apparatus 700 includes: a to-be-detected video acquisition module 710, configured to acquire a to-be-detected video; and a video violation detection module 720, configured to perform the following detection operations on the video to be detected, and determine whether the video to be detected is a violation video: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected.

Example 3

Fig. 8 shows an illegal video detection device 800 according to the present embodiment, which device 800 corresponds to the method according to embodiment 1. Referring to fig. 8, the apparatus 800 includes: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a video to be detected; and executing the following detection operations on the video to be detected to determine whether the video to be detected is an illegal video: detecting illegal contents of images in a key frame of a video to be detected; and/or detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A violation video detection method, comprising:

acquiring a video to be detected; and

executing the following detection operations on the video to be detected, and determining whether the video to be detected is an illegal video:

detecting violation contents of the images in the key frames of the video to be detected; and/or

And detecting the illegal contents of the audio information, the title information and the description information in the video to be detected and/or the text information extracted from the key frame of the video to be detected.

2. The method according to claim 1, wherein the operation of detecting the illegal contents of the images in the key frames of the video to be detected comprises:

detecting whether a key frame of the video comprises an image of an illegal person or not by using a preset face recognition model;

detecting whether an image in a key frame of the video is first-type violation content and a corresponding violation category by using a preset violation content classification model; and

and detecting whether the image in the key frame of the video contains second type violation content, the position of the second type violation content and a corresponding violation category by using a preset violation content detection model.

3. The method according to claim 2, wherein the operation of detecting whether an image in a key frame of the video contains a second type of violation content, a location of the second type of violation content, and a corresponding violation category using a preset violation content detection model includes:

detecting, using an illegal content sub-detection model of the illegal content detection model, an area in which an image in a keyframe of the video is likely to include the second type of illegal content; and

determining whether the image within the region includes the second type of violating content using a violating content sub-classification model of the violating content detection model.

4. The method according to claim 3, wherein the following detection operations are performed on the video to be detected to determine whether the video to be detected is an illegal video, and further comprising determining a probability that the video to be detected meets a regulation according to a result of the following detection:

detecting whether an image of an illegal person is included in a key frame of the video;

detecting whether an image in a key frame of the video is first type violation content and a corresponding violation category; and

detecting whether an image in a keyframe of the video contains second-type violation content, a location of the second-type violation content, and a corresponding violation category.

5. The method according to claim 1, wherein the operation of detecting the illegal content of the audio information in the video to be detected comprises:

generating corresponding first text information according to the audio frequency in the video to be detected by using a preset voice recognition model; and

and detecting the illegal content of the first text information by utilizing a preset text classification model and a preset keyword matching model.

6. The method according to claim 1, wherein the operation of detecting illegal contents of the title information and the description information in the video to be detected comprises:

acquiring second text information corresponding to the title information and the description information; and

and detecting the illegal content of the second text information by utilizing a preset text classification model and a preset keyword matching model.

7. The method according to claim 1, wherein the operation of detecting the content violation of the text information extracted from the video key frame to be detected comprises:

extracting third text information from the key frame of the video by using a preset OCR recognition model; and

and detecting the illegal content of the third text information by utilizing a preset text classification model and a preset keyword matching model.

8. The method according to claim 1, wherein the operation of determining whether the video to be detected is an offending video comprises:

determining a first weight of a first detection result for detecting violation content of an image in a key frame of the video to be detected;

determining a second weight of a second detection result for detecting the illegal content of the audio information, the title information, the description information and/or the text information extracted from the key frame of the video to be detected in the video to be detected; and

and determining whether the video to be detected is an illegal video according to the first detection result, the second detection result, the first weight and the second weight.

9. A storage medium comprising a stored program, wherein the method of any one of claims 1 to 8 is performed by a processor when the program is run.

10. An illegal video detection device, comprising:

the to-be-detected video acquisition module is used for acquiring a to-be-detected video; and

the video violation detection module is used for executing the following detection operations on the video to be detected and determining whether the video to be detected is a violation video:

11. An illegal video detection device, comprising:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:

acquiring a video to be detected; and