CN106973305B

CN106973305B - Method and device for detecting bad content in video

Info

Publication number: CN106973305B
Application number: CN201710166928.0A
Authority: CN
Inventors: 李应斌
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2017-03-20
Filing date: 2017-03-20
Publication date: 2020-02-07
Anticipated expiration: 2037-03-20
Also published as: CN106973305A

Abstract

The embodiment of the invention relates to the technical field of video processing, and discloses a method and a device for detecting bad content in a video, wherein the method comprises the following steps: acquiring a video file to be detected; carrying out video and audio separation on a video file to be detected to obtain audio information and image information; converting the audio information into first text content and converting the image information into second text content; merging and de-duplicating the first text content and the second text content to obtain target text content; comparing the target text content with the sensitive vocabulary list, finding out the sensitive vocabulary in the target text content and obtaining the total word number of the sensitive vocabulary; obtaining a bad content proportion value of the video file to be detected according to the total word number of the sensitive words and the total word number of the target text content; processing the video file to be detected according to the bad content proportion value; by implementing the embodiment of the invention, the identification accuracy of the bad content in the video is improved, and the misjudgment rate of the bad video is reduced.

Description

Method and device for detecting bad content in video

Technical Field

The invention relates to the technical field of video processing, in particular to a method and a device for detecting bad content in a video.

Background

Network video goes deep into people's daily life and becomes a means for people to know knowledge and entertainment. The related range of the network video content is wide, the video content is not uniform, and bad content information such as violence, reaction or fraud is often mixed. With the spread of videos containing bad contents, the social order is disturbed, the social atmosphere is damaged, and the health growth of people, particularly teenagers, is greatly and negatively influenced. It is often necessary to examine the content of the network video to filter out network video with objectionable content. However, the information content of the video is usually large, and the existing filtering method cannot quickly search out the bad video, which is likely to cause the misjudgment of the bad video.

Disclosure of Invention

The embodiment of the invention discloses a method and a device for detecting bad contents in a video, which are used for improving the identification accuracy of the bad contents in the video and reducing the misjudgment rate of the bad video.

The invention discloses a method for detecting bad content in a video in a first aspect, which comprises the following steps:

acquiring a video file to be detected;

carrying out video and audio separation on the video file to be detected to obtain audio information and image information;

converting the audio information into first text content and converting the image information into second text content;

merging and de-duplicating the first text content and the second text content to obtain target text content;

comparing the target text content with the sensitive vocabulary list, finding out the sensitive vocabulary in the target text content and obtaining the total word number of the sensitive vocabulary;

obtaining a bad content proportion value of the video file to be detected according to the total word number of the sensitive words and the total word number of the target text content;

and processing the video file to be detected according to the bad content proportion value.

As an optional implementation manner, in the first aspect of the present invention, the processing the video file to be detected according to the objectionable content ratio value includes:

when the bad content proportion value is smaller than or equal to a preset threshold value, determining that the video file to be detected is a video file with healthy content; and when the bad content proportion value is larger than the preset threshold value, starting a deleting program to delete the video file to be detected.

As an optional implementation manner, in the first aspect of the present invention, after the obtaining the video file to be detected, and before the performing video and audio separation on the video file to be detected to obtain the audio information and the image information, the method further includes:

acquiring the file name of the video file to be detected;

comparing the file name with the sensitive vocabulary list;

when the file name contains the sensitive words in the sensitive word list and the number of the contained sensitive words reaches a preset number, starting a deleting program to delete the video file to be detected;

and when the number of the sensitive words contained in the file name does not reach the preset number, executing the step of performing video and audio separation on the video file to be detected to obtain audio information and image information.

acquiring source information of the video file to be detected;

judging whether the source address indicated by the source information is matched with one illegal source address in a preset illegal source address list or not;

if the video files are matched with the video files to be detected, a deleting program is started to delete the video files to be detected;

and if not, executing the step of carrying out video and audio separation on the video file to be detected to obtain audio information and image information.

when the bad content proportion value is larger than the preset threshold value, starting a deleting program to delete the video file to be detected;

when the bad content proportion value is smaller than or equal to a preset threshold value, extracting a plurality of continuous key frames from the video file to be detected, wherein the plurality of continuous key frames present a certain key scene in the video file to be detected;

acquiring the average motion intensity of the shots in the certain key scene;

judging whether the exercise intensity is greater than a preset intensity value;

if the motion intensity is larger than the preset intensity value, extracting image characteristic data and audio characteristic data from the continuous key frames;

when the image characteristic data is in a preset range of objectionable image characteristic data and the audio characteristic data is in a preset range of objectionable audio characteristic data, starting a deleting program to delete the video file to be detected;

and when the image characteristic data is not in a preset range of objectionable image characteristic data and the audio characteristic data is not in a preset range of objectionable audio characteristic data, determining the video file to be detected as a content health file.

The second aspect of the present invention discloses an apparatus for detecting objectionable content in a video, which may include:

the acquisition unit is used for acquiring a video file to be detected;

the separation unit is used for carrying out video and audio separation on the video file to be detected to obtain audio information and image information;

a text conversion unit for converting the audio information into a first text content and converting the image information into a second text content;

a merging and deduplication unit, configured to merge and deduplicate the first text content and the second text content to obtain a target text content;

the searching unit is used for comparing the target text content with the sensitive vocabulary list, searching the sensitive vocabulary in the target text content and obtaining the total word number of the sensitive vocabulary;

the calculating unit is used for obtaining a bad content proportion value of the video file to be detected according to the total word number of the sensitive vocabulary and the total word number of the target text content;

and the processing unit is used for processing the video file to be detected according to the bad content proportion value.

As an optional implementation manner, in the second aspect of the present invention, the processing unit is configured to, according to the bad content ratio value, specifically, process the video file to be detected in a manner that:

the processing unit is used for determining that the video file to be detected is a video file with healthy content when the calculating unit determines that the ratio value of the bad content is smaller than or equal to a preset threshold value; and when the computing unit determines that the ratio value of the bad content is greater than the preset threshold value, starting a deleting program to delete the video file to be detected.

As an alternative embodiment, in the second aspect of the present invention, the apparatus further comprises:

the name detection unit is used for acquiring the file name of the video file to be detected and comparing the file name with the sensitive vocabulary list after the acquisition unit acquires the video file to be detected and before the separation unit performs video and audio separation on the video file to be detected and acquires audio information and image information;

the processing unit is further used for starting a deleting program to delete the video file to be detected when the name detection unit determines that the file name contains the sensitive words in the sensitive word list and the number of the contained sensitive words reaches a preset number;

the separation unit is further used for executing video and audio separation on the video file to be detected to obtain audio information and image information when the name detection unit determines that the number of the sensitive words contained in the file name does not reach the preset number.

the source detection unit is used for acquiring the source information of the video file to be detected and judging whether a source address indicated by the source information is matched with one illegal source address in a preset illegal source address list or not after the acquisition unit acquires the video file to be detected and before the separation unit performs video and audio separation on the video file to be detected and acquires audio information and image information;

the processing unit is further used for starting a deleting program to delete the video file to be detected when the judgment result of the source detection unit is matched;

and the separation unit is also used for executing the video and audio separation of the video file to be detected to obtain audio information and image information when the judgment result of the source detection unit is not matched.

the processing unit is used for starting a deleting program to delete the video file to be detected when the calculating unit determines that the ratio value of the bad content is greater than the preset threshold value;

when the bad content proportion value is smaller than or equal to a preset threshold value, extracting a plurality of continuous key frames from the video file to be detected, wherein the plurality of continuous key frames present a certain key scene in the video file to be detected; acquiring the average motion intensity of the shots in the certain key scene; judging whether the exercise intensity is greater than a preset intensity value; if the motion intensity is larger than the preset intensity value, extracting image characteristic data and audio characteristic data from the continuous key frames; when the image characteristic data is in a preset range of objectionable image characteristic data and the audio characteristic data is in a preset range of objectionable audio characteristic data, starting a deleting program to delete the video file to be detected; and when the image characteristic data is not in a preset range of objectionable image characteristic data and the audio characteristic data is not in a preset range of objectionable audio characteristic data, determining the video file to be detected as a content health file.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, a video file to be detected is obtained, an audio frequency and a video image in the video file to be detected are separated to obtain audio information and image information, then the audio information is converted into a first text content and the image information is converted into a second text content respectively, the first text content and the second text content are combined, and then duplication is removed to obtain a target text content. And then comparing the target text content with the sensitive vocabularies in the sensitive vocabulary list one by one, searching the sensitive vocabularies in the target text content and obtaining the total word number of all the sensitive vocabularies searched from the target text content, further obtaining the defective content proportion value of the video file to be detected according to the total word number of the sensitive vocabularies and the total word number of the target text content, and then processing the video file to be detected according to the defective content proportion value. By combining and de-duplicating the first text content and the second text content, the embodiment of the invention can ensure the uniqueness of the content in the target text content, improve the contrast speed and accuracy of the text content and the sensitive vocabulary list, improve the identification accuracy of the bad content in the video and reduce the misjudgment rate of the bad video.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for detecting objectionable content in a video according to an embodiment of the present invention;

FIG. 2 is another flow chart illustrating a method for detecting objectionable content in a video according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an apparatus for detecting objectionable content in a video according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an apparatus for detecting objectionable content in a video according to an embodiment of the present invention;

fig. 5 is another schematic structural diagram of an apparatus for detecting objectionable content in a video according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, of embodiments of the present invention are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a method for detecting bad content in a video, which is used for improving the identification accuracy of the bad content in the video and reducing the misjudgment rate of the bad video. The embodiment of the invention also discloses a device for detecting the bad content in the video.

The technical solution of the present invention will be described in detail with reference to the following embodiments.

Example one

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a method for detecting objectionable content in a video according to an embodiment of the present invention; as shown in fig. 1, a method for detecting objectionable content in a video may include:

101. and acquiring the video file to be detected.

It will be appreciated that the video content of the video file to be detected is composed of audio and video images. In some embodiments, the video file to be detected may be a video file to be played by a user, and content detection is performed on the video file before playing, specifically including: and receiving a playing instruction input by a user for the video file to be detected, calling a video detection interface based on the playing instruction to start a detection device for the bad content in the video, and executing the step of acquiring the video file to be detected, wherein the video detection interface is implicitly associated with the detection device for the bad content in the video.

Further, after receiving a playing instruction input by a user for the video file to be detected and before calling a video detection interface to enable a detection device of bad content in a video based on the playing instruction, detecting whether the user allows the video detection interface to be called or not, and if the user does not allow the video detection interface to be called, prompting the user to enable a function of calling the video detection interface; after prompting the user to enable the function of calling the video detection interface, judging whether an enabling operation of the user for calling the function of the video detection interface is received, if so, executing a step of calling the video detection interface based on the playing instruction to enable a detection device of bad content in the video; if not, the video file to be detected is refused to be played.

In other embodiments, the video file to be detected is a video file that needs to be detected and is specified by a user, and step 101 specifically includes: and receiving a file name corresponding to the video file to be detected input by a user, and searching the video file to be detected corresponding to the file name in a video library to obtain the video file to be detected.

Further, after receiving a file name corresponding to the video file to be detected input by a user and before searching the video file to be detected corresponding to the file name in a video library to obtain the video file to be detected, detecting whether the user allows to call a video detection interface, wherein the video detection interface is implicitly associated with a detection device for bad content in a video, and if the user does not allow to call the video detection interface, prompting the user to enable the function of calling the video detection interface; after prompting the user to enable the function of calling the video detection interface, judging whether an enabling operation of the user for calling the function of the video detection interface is received, if so, executing a step of searching the video library for the video file to be detected corresponding to the file name to obtain the video file to be detected; if not, the flow ends.

102. And carrying out video and audio separation on the video file to be detected to obtain audio information and image information.

The audio information and the image information in the video file to be detected can be separated by adopting video editing software, for example, the video file to be detected is imported into a video track (time axis), then the audio is divided, namely the audio and the video image are divided, then the audio is stored as a file corresponding to an audio format to obtain the audio information, and the other audio is stored as an image file to obtain the image information.

103. The audio information is converted into a first text content and the image information is converted into a second text content.

As an optional implementation, converting the audio information into the first text content specifically includes:

and converting the voice contained in the audio information into text according to the time axis sequence of the audio information. Specifically, voices are sequentially extracted from the audio information according to a time axis sequence of the audio information, the voices are converted into texts through a Speech-to-text (STT) function or algorithm, and then text sentence breaking and typesetting are performed according to pauses of the voices in the audio information to obtain first text contents.

As an optional implementation, converting the image information into the second text content specifically includes:

and identifying the image information through an image identification tool to convert the image information into text content, so as to obtain the second text content.

104. And merging and de-duplicating the first text content and the second text content to obtain the target text content.

And combining the first text content and the second text content, and then removing duplication, namely removing repeated content, so that the target text content has no repeated content.

105. And comparing the target text content with the sensitive vocabulary list, finding out the sensitive vocabulary in the target text content and obtaining the total word number of the sensitive vocabulary.

Wherein the sensitive vocabulary list is established in advance. Specifically, a sensitive vocabulary basic database including bad contents such as violence, reaction, fraud and the like can be established, sensitive vocabularies related to violence, reaction, fraud and the like are automatically captured from a network and stored in the basic database, then vocabularies similar or similar to the sensitive vocabularies are captured and stored in the basic database, or various unhealthy vocabularies fed back by a user are obtained and stored in the basic database as sensitive vocabularies, finally the sensitive vocabularies in the basic database can be manually identified, and a sensitive vocabulary list is established for the finally determined sensitive vocabularies.

106. And obtaining a bad content proportion value of the video file to be detected according to the total word number of the sensitive words and the total word number of the target text content.

The calculation formula of the bad content proportion value K is as follows:

K＝N/M；

wherein, N is the total word number of all sensitive words in the sensitive word list included in the target text content, and M is the total word number of the target text content.

For example, if the target text content includes 5 sensitive words, 2 sensitive words include 2 words, and the other 3 sensitive words include 3 words, as compared with the list of sensitive words, the total number of words of all the sensitive words in the target text content is (in units of): 2 x 2+3 x 3 ═ 13 (pieces).

107. And processing the video file to be detected according to the bad content proportion value.

As an optional implementation manner, processing the video file to be detected according to the objectionable content ratio specifically includes:

when the bad content proportion value is smaller than or equal to a preset threshold value, determining that the video file to be detected is a video file with healthy content; and when the bad content proportion value is larger than a preset threshold value, starting a deleting program to delete the video file to be detected. In this embodiment, when the ratio of the objectionable content is greater than the preset threshold, it is determined that the video file to be detected contains most objectionable content, and the objectionable content exceeds the preset acceptable range, the video file to be detected is prohibited from being played, so that a deletion program is started to delete the video file to be detected, and the video file of the objectionable content is prevented from being transmitted over the network.

Example two

Referring to fig. 2, fig. 2 is another schematic flow chart illustrating a method for detecting objectionable content in a video according to an embodiment of the present invention; as shown in fig. 2, a method for detecting objectionable content in a video may include:

201. and acquiring the video file to be detected.

Reference may be made to the detailed description in step 101, which is not described herein again.

202. And carrying out video and audio separation on the video file to be detected to obtain audio information and image information.

As an optional implementation manner, after the video file to be detected is acquired in step 201, and before the video and audio separation is performed on the video file to be detected in step 202 to obtain the audio information and the image information, the embodiment of the present invention further includes:

acquiring a file name of a video file to be detected;

comparing the file name with a sensitive vocabulary list;

and when the number of the sensitive words contained in the file name does not reach the preset number, performing video and audio separation on the video file to be detected to obtain audio information and image information.

In the above embodiment, after the video file to be detected is obtained, the file name of the video file to be detected is further obtained, and if it is determined that the file name includes a certain number of sensitive words and the sensitive words are sensitive words in the sensitive word list, the video file to be detected is directly deleted.

It can also be understood that the file name is compared with the sensitive vocabulary list to obtain the sensitive vocabulary of the file name, then the total word number of all the sensitive vocabularies in the file name is obtained, the total word number is compared with the total word number of the file name to obtain a proportional value, and if the proportional value exceeds the specified value, the video file to be detected is deleted. If the value is less than or equal to the specified value, the step of performing video and audio separation on the video file to be detected to obtain audio information and image information can be further executed.

As another optional implementation manner, after the video file to be detected is acquired in step 201, and before the video and audio separation is performed on the video file to be detected in step 202 to obtain the audio information and the image information, the embodiment of the present invention further includes:

acquiring source information of a video file to be detected;

judging whether a source address indicated by the source information is matched with one illegal source address in a preset illegal source address list or not;

The source information includes a source internet protocol Address (IP), a source gateway, and the like. By the embodiment, whether the source of the video file to be detected is a legal source can be preliminarily judged through the source information of the video file to be detected, and then the steps of carrying out video and audio separation on the video file to be detected and obtaining audio information and image information are further executed. If the video file to be detected is an illegal source, the video file to be detected is a no-play video, and the video file to be detected is directly deleted to prevent the illegal video from being transmitted on the network.

acquiring source information of a video file to be detected;

if not, acquiring the file name of the video file to be detected;

comparing the file name with a sensitive vocabulary list;

Through the embodiment, the source and the file name of the video file to be detected can be combined, the primary draft filtering is carried out on the video file to be detected, and the misjudgment rate of bad videos can be reduced through multi-layer identification.

203. The audio information is converted into a first text content and the image information is converted into a second text content.

204. And merging and de-duplicating the first text content and the second text content to obtain the target text content.

205. And comparing the target text content with the sensitive vocabulary list, finding out the sensitive vocabulary in the target text content and obtaining the total word number of the sensitive vocabulary.

206. And obtaining a bad content proportion value of the video file to be detected according to the total word number of the sensitive words and the total word number of the target text content.

207. And when the bad content proportion value is larger than a preset threshold value, starting a deleting program to delete the video file to be detected.

208. And when the bad content proportion value is smaller than or equal to a preset threshold value, extracting a plurality of continuous key frames from the video file to be detected, wherein the plurality of continuous key frames present a certain key scene in the video file to be detected.

In the embodiment of the invention, when the proportion value of the objectionable content is less than or equal to the preset threshold value, a method for correctly processing the video file to be detected is further obtained by combining a certain key scene, and the identification accuracy rate of the objectionable content in the video is improved.

209. And acquiring the average motion intensity of the shots in a certain key scene.

The average motion intensity of the shots is equal to the ratio of the sum of the motion intensities of all the shots in the scene to the number of the shots in the scene, and the specific calculation method is the prior art and is not described herein again.

210. And judging whether the exercise intensity is greater than a preset intensity value.

And if the exercise intensity is less than or equal to the preset intensity value, ending the process.

211. If the motion intensity is greater than a preset intensity value, image feature data and audio feature data are extracted from a plurality of consecutive key frames.

Wherein the image feature data comprises image feature data for each key frame and the audio feature data comprises audio feature data for the scene.

Specifically, the image feature data of each key frame comprises a color histogram of each key frame, and extracting the image feature data from a plurality of consecutive key frames comprises: and extracting a color histogram of each frame of image from a plurality of continuous key frames.

Specifically, the audio feature data includes a sample vector and a covariance matrix of the audio data. Further, the audio feature data may also include an energy entropy of the audio data.

212. And when the image characteristic data is in the preset range of the objectionable image characteristic data and the audio characteristic data is in the preset range of the objectionable audio characteristic data, starting a deleting program to delete the video file to be detected.

When the image feature data of each key frame comprises the color histogram of each key frame, determining that the image feature data is in a preset poor image feature data range comprises:

and when the statistical number of the preset number of colors in the color histogram of the key frame is determined to be within the statistical number range of the corresponding colors in the color histogram of the video frame extracted from the specific scene in advance, determining that the image characteristic data is within the preset range of the poor image characteristic data.

Determining that the audio characteristic data is within a preset range of objectionable audio characteristic data comprises: and calculating a sample vector and a covariance matrix of the audio data in the scene, and when the similarity between the sample vector and the covariance matrix of the audio data in the scene and the sample vector and the covariance matrix of the audio data extracted from the specific scene in advance is larger than a third preset threshold value, determining that the audio characteristic data is in a preset range of bad audio characteristic data.

When the audio feature data further includes energy entropy of the audio data, determining that the audio feature data is within a preset objectionable audio feature data range includes: dividing the audio data in the scene into multiple sections, calculating the energy entropy of each section of audio data, and determining that the audio characteristic data is in a preset range of bad audio characteristic data when the energy entropy of at least one section of audio data in the energy entropies of the multiple sections of audio data is smaller than a fourth preset threshold value.

213. And when the image characteristic data is not in the preset range of the objectionable image characteristic data and the audio characteristic data is not in the preset range of the objectionable audio characteristic data, determining the video file to be detected as a content health file.

It can be seen that, in the embodiment of the present invention, when the ratio of the objectionable content is less than or equal to the preset threshold, the audio information and the image information in the video file to be detected are further analyzed, so as to further analyze the specific gravity of the objectionable content contained therein, thereby improving the accuracy of determining the objectionable video.

EXAMPLE III

Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus for detecting objectionable content in a video according to an embodiment of the present invention; as shown in fig. 3, an apparatus for detecting objectionable content in a video may comprise:

an obtaining unit 310, configured to obtain a video file to be detected;

the separation unit 320 is configured to perform video and audio separation on the video file to be detected, so as to obtain audio information and image information;

a text conversion unit 330 for converting the audio information into first text content and converting the image information into second text content;

a merging and deduplication unit 340, configured to merge and deduplicate the first text content and the second text content to obtain a target text content;

the searching unit 350 is configured to compare the target text content with the sensitive vocabulary list, search for a sensitive vocabulary in the target text content, and obtain a total word number of the sensitive vocabulary;

the calculating unit 360 is used for obtaining a bad content proportion value of the video file to be detected according to the total word number of the sensitive vocabulary and the total word number of the target text content;

and the processing unit 370 is configured to process the video file to be detected according to the objectionable content ratio value.

In the embodiment of the present invention, the obtaining unit 310 obtains a video file to be detected, the separating unit 320 separates an audio image and a video image in the video file to be detected to obtain audio information and image information, the text converting unit 330 converts the audio information into a first text content and the image information into a second text content, respectively, and the merging and deduplication unit 340 merges the first text content and the second text content and then deduplicates the first text content and the second text content to obtain a target text content. Then, the searching unit 350 compares the target text content with the sensitive words in the list of sensitive words one by one, searches for the sensitive words in the target text content and obtains the total word count of all the sensitive words searched from the target text content, the calculating unit 360 further obtains the bad content proportion value of the video file to be detected according to the total word count of the sensitive words and the total word count of the target text content, and the processing unit 370 processes the video file to be detected according to the bad content proportion value. By combining and de-duplicating the first text content and the second text content, the embodiment of the invention can ensure the uniqueness of the content in the target text content, improve the contrast speed and accuracy of the text content and the sensitive vocabulary list, improve the identification accuracy of the bad content in the video and reduce the misjudgment rate of the bad video.

As an optional implementation manner, the video file to be detected may be a video file to be played by a user, content detection is performed on the video file before playing, and the manner for acquiring the video file to be detected by the acquiring unit 310 is specifically: the obtaining unit 310 is configured to receive a play instruction input by a user for the video file to be detected, and based on the play instruction, invoke a video detection interface to enable a detection device for detecting undesirable content in a video, and execute obtaining of the video file to be detected, where the video detection interface is implicitly associated with the detection device for detecting undesirable content in the video.

Further, after receiving a play instruction input by a user for the video file to be detected and before invoking a video detection interface to enable a detection device for objectionable content in a video based on the play instruction, the obtaining unit 310 detects whether the user allows invoking the video detection interface, and if the user does not allow invoking the video detection interface, prompts the user to enable a function of invoking the video detection interface; after prompting the user to enable the function of calling the video detection interface, judging whether an enabling operation of the user for calling the function of the video detection interface is received, if so, executing a detection device for calling the video detection interface to enable bad content in the video based on the playing instruction; if not, the video file to be detected is refused to be played.

In other embodiments, the video file to be detected is a video file that needs to be detected and is specified by a user, and the obtaining unit 310 is specifically configured to receive a file name corresponding to the video file to be detected and input by the user, and search the video file to be detected and corresponding to the file name in a video library to obtain the video file to be detected.

Further, after receiving a file name corresponding to the video file to be detected input by the user and before searching the video file to be detected corresponding to the file name in the video library to obtain the video file to be detected, the obtaining unit 310 detects whether the user allows to invoke a video detection interface, where the video detection interface is implicitly associated with a detection device for undesirable content in the video, and if the user does not allow to invoke the video detection interface, prompts the user to enable a function of invoking the video detection interface; after prompting the user to enable the function of calling the video detection interface, judging whether an enabling operation of the user for calling the function of the video detection interface is received, if so, searching the video library for the video file to be detected corresponding to the file name to obtain the video file to be detected; if not, the flow ends.

As an optional implementation manner, the processing unit 370 is configured to process the video file to be detected according to the objectionable content ratio value in a specific manner:

the processing unit 370 is configured to determine that the video file to be detected is a video file with healthy content when the calculating unit 360 determines that the ratio of the objectionable content is smaller than or equal to the preset threshold; when the calculating unit 360 determines that the ratio of the bad content is greater than the preset threshold, a deleting program is started to delete the video file to be detected.

In the above embodiment, when the ratio of the objectionable content is greater than the preset threshold, it indicates that the video file to be detected contains most objectionable content, and the objectionable content exceeds the preset acceptable range (preset threshold), the video file to be detected is prohibited from being played, so as to start a deletion program and delete the video file to be detected, so as to prevent the video file of the objectionable content from being transmitted over the network.

As an optional implementation manner, the manner for converting the audio information into the first text content by the merging and deduplication unit 340 is specifically: the merging and deduplication unit 340 is configured to convert the speech contained in the audio information into text according to the time axis sequence of the audio information. Specifically, the merging and deduplication unit 340 is configured to sequentially extract voices from the audio information according to a time axis sequence of the audio information, convert the voices into texts through a Speech-to-text (STT) function or algorithm, and perform text sentence-breaking and typesetting according to pauses of the voices in the audio information to obtain the first text content.

As an optional implementation manner, the manner for converting the image information into the second text content by the merging and deduplication unit 340 is specifically: the merging and deduplication unit 340 is configured to recognize the image information through an image recognition tool to convert the image information into text content, so as to obtain the second text content.

Example four

Referring to fig. 4, fig. 4 is another schematic structural diagram of an apparatus for detecting objectionable content in a video according to an embodiment of the present invention; the apparatus for detecting the defective content in the video shown in fig. 4 is optimized based on the apparatus for detecting the defective content in the video shown in fig. 3, and as shown in fig. 4, the apparatus for detecting the defective content in the video further includes:

the name detection unit 410 is configured to, after the acquisition unit 310 acquires the video file to be detected, and before the separation unit 320 performs video-audio separation on the video file to be detected to obtain audio information and image information, acquire a file name of the video file to be detected, and compare the file name with the sensitive vocabulary list;

the processing unit 370 is further configured to, when the name detecting unit 410 determines that the file name includes a sensitive vocabulary in the sensitive vocabulary list and the number of the included sensitive vocabularies reaches a preset number, start a deleting program to delete the video file to be detected;

the separating unit 320 is further configured to, when the name detecting unit 410 determines that the number of the sensitive words included in the file name does not reach the preset number, perform video and audio separation on the video file to be detected to obtain audio information and image information.

The separating unit 320 performs video and audio separation on the video file to be detected, and the manner of obtaining the audio information and the image information specifically includes: the separating unit 320 imports the video file to be detected into a video track (time axis) of the video editing software, then divides the audio, i.e., divides the audio and the video image, then stores the audio as a file corresponding to the audio format to obtain audio information, and stores the audio as an image file to obtain image information.

EXAMPLE five

Referring to fig. 5, fig. 5 is another schematic structural diagram of an apparatus for detecting objectionable content in a video according to an embodiment of the present invention; the apparatus for detecting the defective content in the video shown in fig. 5 is optimized based on the apparatus for detecting the defective content in the video shown in fig. 3, and as shown in fig. 5, the apparatus for detecting the defective content in the video further includes:

a source detecting unit 510, configured to, after the obtaining unit 310 obtains the video file to be detected, and before the separating unit 320 performs video and audio separation on the video file to be detected, and obtains audio information and image information, obtain source information of the video file to be detected, and determine whether a source address indicated by the source information matches with an illegal source address in a preset illegal source address list;

the processing unit 370 is further configured to, when the determination result of the source detecting unit 510 is a match, start a deleting program to delete the video file to be detected;

the separating unit 320 is further configured to, when the determination result of the source detecting unit 510 is not matching, perform video and audio separation on the video file to be detected to obtain audio information and image information.

the processing unit 370 is configured to, when the calculating unit 360 determines that the ratio of the objectionable content is greater than the preset threshold, start a deleting program to delete the video file to be detected;

when the calculating unit 360 determines that the ratio value of the bad content is smaller than or equal to a preset threshold value, extracting video key frames from the video file to be detected; extracting motion characteristic information of the video key frame, wherein the motion characteristic information is used for representing the motion intensity presented by a lens of the video key frame; judging whether the exercise intensity is greater than a preset intensity value; if the motion intensity is larger than the preset intensity value, extracting image characteristic data and audio characteristic data from the video key frame; when the image characteristic data is in a preset range of the objectionable image characteristic data and the audio characteristic data is in a preset range of the objectionable audio characteristic data, starting a deleting program to delete the video file to be detected; and when the image characteristic data is not in the preset range of the objectionable image characteristic data and the audio characteristic data is not in the preset range of the objectionable audio characteristic data, determining the video file to be detected as a content health file.

When the image feature data of each key frame includes the color histogram of each key frame, the manner for determining that the image feature data is in the preset poor image feature data range by the processing unit 370 is specifically:

the processing unit 370 determines that the image feature data is within the preset range of poor image feature data when determining that the statistical number of the preset number of colors in the color histogram of the key frame is within the statistical number range of the corresponding colors in the color histogram of the video frame extracted from the specific scene in advance.

The processing unit 370 is configured to determine that the audio characteristic data is within the preset poor audio characteristic data range specifically as follows: the processing unit 370 is configured to calculate a sample vector and a covariance matrix of the audio data in the scene, and when it is determined that the similarity between the sample vector and the covariance matrix of the audio data in the scene and the sample vector and the covariance matrix of the audio data extracted from the specific scene in advance is greater than a third preset threshold, it is determined that the audio feature data is within a preset range of bad audio feature data.

When the audio feature data further includes the energy entropy of the audio data, the processing unit 370 is configured to determine that the audio feature data is within the preset range of the objectionable audio feature data by: the processing unit 370 is configured to divide the audio data in the scene into multiple segments, calculate an energy entropy of each segment of the audio data, and determine that the audio feature data is within a preset range of objectionable audio feature data when the energy entropy of at least one segment of the audio data in the multiple segments of the audio data is smaller than a fourth preset threshold.

By implementing the device, the uniqueness of the content in the target text content can be ensured by combining and de-duplicating the first text content and the second text content, the contrast speed and accuracy of the text content and the sensitive vocabulary list are improved, the identification accuracy of the bad content in the video is improved, and the misjudgment rate of the bad video is reduced. And when the proportion value of the objectionable content is smaller than or equal to the preset threshold value, further analyzing the audio information and the image information in the video file to be detected so as to further analyze the proportion of the objectionable content contained in the video file to improve the accurate judgment rate of the objectionable video.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.

The method and the device for detecting the bad content in the video disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for detecting objectionable content in a video, comprising:

acquiring a video file to be detected;

converting voice contained in the audio information into text according to a time axis sequence of the audio information, converting the audio information into first text content, identifying image information to convert the image information into text content, and converting the image information into second text content;

2. The method according to claim 1, wherein said processing the video file to be detected according to the bad content ratio value comprises:

3. The method according to claim 1 or 2, wherein after the acquiring the video file to be detected, and before the performing video-audio separation on the video file to be detected to obtain audio information and image information, the method further comprises:

acquiring the file name of the video file to be detected;

comparing the file name with the sensitive vocabulary list;

4. The method according to claim 1 or 2, wherein after the acquiring the video file to be detected, and before the performing video-audio separation on the video file to be detected to obtain audio information and image information, the method further comprises:

acquiring source information of the video file to be detected;

5. The method according to claim 1, wherein said processing the video file to be detected according to the bad content ratio value comprises:

when the bad content proportion value is larger than a preset threshold value, starting a deleting program to delete the video file to be detected;

acquiring the average motion intensity of the shots in the certain key scene;

6. An apparatus for detecting objectionable content in a video, comprising:

the acquisition unit is used for acquiring a video file to be detected;

the merging and de-duplicating unit is used for converting the voice contained in the audio information into first text content according to the time shaft sequence of the audio information, identifying the image information to convert the image information into second text content, and merging and de-duplicating the first text content and the second text content to obtain target text content;

7. The apparatus according to claim 6, wherein the processing unit is configured to process the video file to be detected according to the bad content ratio value by:

8. The apparatus of claim 6 or 7, further comprising:

9. The apparatus of claim 6 or 7, further comprising:

10. The apparatus according to claim 6, wherein the processing unit is configured to process the video file to be detected according to the bad content ratio value by:

the processing unit is used for starting a deleting program to delete the video file to be detected when the calculating unit determines that the ratio value of the bad content is greater than a preset threshold value;