CN116823726A

CN116823726A - Video detection method, device, equipment and storage medium

Info

Publication number: CN116823726A
Application number: CN202310542674.3A
Authority: CN
Inventors: 许力强
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-09-29

Abstract

The disclosure provides a video detection method, a device, equipment and a storage medium, relates to the technical field of computers, and particularly relates to the technical field of artificial intelligence and video detection. The specific implementation scheme is as follows: obtaining a to-be-detected video; analyzing the video to be detected to obtain constituent elements of the video to be detected; and detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, wherein the detection result is used for reflecting whether the video to be detected is a qualified video or not, so that the quality of the video to be detected can be detected, unqualified videos can be found in time, low-quality videos on the line are reduced, and the use experience of a user is improved.

Description

Video detection method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence and video detection, and specifically relates to a video detection method, device, equipment and storage medium.

Background

With the development of artificial intelligence, AIGC (AI generated content, also called generation AI), i.e., content generated by artificial intelligence, is increasing, such as TTV video (Text to video, article to video, also called document video). TTV video can simulate the human authoring process and the system can generate a video based on the user-entered teletext information, such as an article, a search request, or a slide show.

Disclosure of Invention

The embodiment of the disclosure provides a video detection method, a device, equipment and a storage medium.

According to an aspect of the embodiments of the present disclosure, there is provided a video detection method, including: obtaining a to-be-detected video; analyzing the video to be detected to obtain constituent elements of the video to be detected; detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, wherein the detection result is used for reflecting whether the video to be detected is a qualified video.

According to another aspect of the embodiments of the present disclosure, there is provided a video detection apparatus including: the acquisition unit is used for acquiring the to-be-detected video; the analysis unit is used for analyzing the video to be detected to obtain constituent elements of the video to be detected; the video detection unit is used for detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, and the detection result is used for reflecting whether the video to be detected is a qualified video or not.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.

According to another aspect of the disclosed embodiments, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method according to any of the disclosed embodiments.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.

The video detection method, the device, the equipment and the storage medium provided by the embodiment of the disclosure are used for obtaining the to-be-detected video; analyzing the video to be detected to obtain constituent elements of the video to be detected; and detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, wherein the detection result is used for reflecting whether the video to be detected is a qualified video or not, so that the quality of the video to be detected can be detected, unqualified videos can be found in time, low-quality videos on the line are reduced, and the use experience of a user is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of the generation of raw video in an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a TTV system in an embodiment of the present disclosure;

FIGS. 3 a-3 c are schematic diagrams of problems that may exist with TTV video;

FIG. 4 is a schematic diagram of a system for applying a video detection method of an embodiment of the present disclosure;

FIG. 5 is a flow chart of a video detection method provided in accordance with an embodiment of the present disclosure;

FIG. 6 is a block diagram of a video detection method provided in accordance with an embodiment of the present disclosure;

FIG. 7 is a flow chart of a video detection method provided in accordance with an embodiment of the present disclosure;

fig. 8 is a flowchart of a decision detection result of a video detection method provided according to an embodiment of the present disclosure;

FIG. 9 is a flow chart of failed video processing for a video detection method provided in accordance with an embodiment of the present disclosure;

fig. 10 is a block diagram of a video detection apparatus provided according to an embodiment of the present disclosure;

fig. 11 is a block diagram of an electronic device for implementing a video detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The embodiment of the disclosure provides a video detection method, a video detection device, electronic equipment and a storage medium. Specifically, the account switching method of the embodiment of the disclosure may be executed by an electronic device, where the electronic device may be a terminal or a server. The terminal can be smart phones, tablet computers, notebook computers, intelligent voice interaction equipment, intelligent household appliances, wearable intelligent equipment, aircrafts, intelligent vehicle-mounted terminals and other equipment, and the terminal can also comprise a client, wherein the client can be an audio client, a video client, a browser client, an instant messaging client or an applet and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

FIG. 1 is a schematic diagram of the generation of raw video in an embodiment of the present disclosure. TTV video is represented by the ability to generate a video from an article. As shown in fig. 1, the user inputs a text, possibly an article, a search request, and possibly even a slide PPT, and the TTV system generates a video.

FIG. 2 is a schematic diagram of a TTV system according to an embodiment of the present disclosure, and referring to FIG. 2, the TTV system logically simulates a human authoring process. First, a document understanding and organization is performed, for example, a document is generated from articles or topics input by a user, and the contents thereof are understood. The material library may then be utilized to find material, which may include a full-network material library and a dedicated material library. Some materials in the material library are real materials obtained from internet and possibly generated through AI. And generating the material and analyzing and processing the material according to the material library so as to expand the material. And then, arranging materials, aligning the materials with the text, ensuring that the finally generated caption audio and the displayed visual picture are aligned, and rendering the video to obtain the TTV video. Wherein the process may be implemented using a large-focused model.

However, the generated TTV video may have some problems in video effect, but there is no detection system for TTV video at present, resulting in poor quality of online video, and even if manual detection is adopted, there may be a case of missed detection, still some video with poor experience may be distributed to the online.

Fig. 3a to 3c are schematic diagrams of problems that may exist in TTV video, for example, in fig. 3a, there is a subtitle overscreen problem, i.e. the subtitle of a document is beyond the screen, in fig. 3b, there is a blank subtitle problem, i.e. the picture or video material has no corresponding subtitle, and in fig. 3c, there is a material repetition problem, i.e. the content of 4min and 5min (two segments of material marked with grey boxes) in the video track is the same.

The above gives some off-specification videos from subtitle, material analysis and scene composition, respectively, and these off-specification videos may be distributed on the client consumer cell phone. In addition, the plurality of machine auditing strategies in the related technology mainly aim at manually produced videos, and the problem of TTV videos does not usually occur in the videos, so that the current machine auditing cannot customize AIGC video strategies, cannot support TTV video detection, and easily causes poor user experience of the final videos. Moreover, the manual audit can cause a missed detection problem, and partial unqualified videos can be distributed on line. The standard calls of the AI multiple models in the whole production link of the TTV video are high and low, so that the TTV video is necessary to be comprehensively detected, namely, unqualified videos can be effectively found from output results, and meanwhile, index results of strategy standard calls of multiple links can be better controlled, so that great influence caused by strategy online is avoided.

In order to solve at least one of the above problems, embodiments of the present disclosure provide a video detection method, apparatus, device, and storage medium, by acquiring a video to be inspected; analyzing the video to be detected to obtain constituent elements of the video to be detected; and detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, wherein the detection result is used for reflecting whether the video to be detected is a qualified video or not, so that the quality of the video to be detected can be detected, unqualified videos can be found in time, low-quality videos on the line are reduced, and the use experience of a user is improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. Fig. 4 is a schematic structural diagram of a system to which the video detection method of the embodiment of the present disclosure is applied. Referring to fig. 4, the system includes a terminal 410, a server 420, and the like; the terminal 410 and the server 420 are connected to each other through a network, for example, a wired or wireless network connection.

Wherein the terminal 410 may be used to display a graphical user interface. The terminal is used for interacting with a user through a graphical user interface, for example, the terminal downloads and installs a corresponding client and operates, for example, the terminal invokes a corresponding applet and operates, for example, the terminal presents a corresponding graphical user interface through a login website, and the like. In the embodiment of the present disclosure, the terminal 410 may be installed with a TTV video generating application, and the user may generate the video to be viewed through the terminal 410. The server 420 can acquire the to-be-inspected video in real time or periodically; analyzing the video to be detected to obtain constituent elements of the video to be detected; detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, wherein the detection result is used for reflecting whether the video to be detected is a qualified video.

The application may be an application installed on a desktop, an application installed on a mobile terminal, an applet embedded in an application, or the like.

It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present disclosure, and embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

The following is a detailed description. It should be noted that the following description order of embodiments is not a limitation of the priority order of embodiments.

FIG. 5 is a flow chart of a video detection method provided in accordance with an embodiment of the present disclosure; referring to fig. 5, an embodiment of the disclosure provides a video detection method 500, which includes the following steps S501 to S503.

Step S501, a to-be-inspected video is obtained.

Step S502, analyzing the video to be inspected to obtain the constituent elements of the video to be inspected.

Step S503, detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, wherein the detection result is used for reflecting whether the video to be detected is a qualified video.

The video to be inspected refers to a TTV video generated by using a TTV system.

And analyzing the to-be-inspected video to obtain the constituent elements of the to-be-inspected video. The constituent elements may be respective basic elements constituting the TTV video.

Fig. 6 is a block diagram of a video detection method according to an embodiment of the present disclosure, referring to fig. 6, for example, constituent elements may be respectively various, such as subtitles, audio, materials, virtual persons, special effects, and the like. The subtitles may be cut from the text. The audio may include bgm (Background music) or tts (Text to speech) or the like. The material may include dynamic picture material, which may include video, moving pictures, short video, etc., and static picture material, which may include images. The virtual person may be a virtual animated character. Special effects may include fonts as well as backgrounds, etc.

After the to-be-inspected video is analyzed into the constituent elements, the constituent elements can be detected according to a preset detection rule, so that whether the to-be-inspected video is qualified or not is judged.

The preset detection rules may include various types, as shown in fig. 6, which may cover links such as document organization, material analysis, scene arrangement, and the like. The text organizing link can cover a subtitle strategy, and the subtitle strategy can comprise a segmented line number rule, a segmented end-of-text quality rule or the like. The material analysis link may cover a material policy, which may include a dynamic material duty rule and/or a material repetition rule, etc. The scene arrangement link may include an arrangement policy, which may include a blank caption rule, a node front-to-back alignment rule, and/or a document and material correspondence rule, etc.

Through one or more of the preset detection rules, a detection result, for example, whether the video to be detected is a qualified video or a disqualified video, can be obtained.

By the method 500 provided by the embodiment, the quality of the video to be detected can be detected, and unqualified videos can be found in time, so that the online low-quality videos are reduced, and the use experience of users is improved.

Fig. 7 is a flowchart of a video detection method according to an embodiment of the present disclosure, referring to fig. 7, in some embodiments, the parsing of the video to be detected in step S502 to obtain constituent elements of the video to be detected may include the following steps: searching a production database of the video to be inspected according to the first identification of the video to be inspected to obtain an excavation library of the video to be inspected and a material table, wherein the excavation library comprises the second identification of the constituent elements; searching a material table according to the second identification of the constituent elements to obtain material information of the constituent elements; and analyzing the video to be inspected according to the material information to obtain the constituent elements of the video to be inspected.

It is understood that the to-be-inspected video may have a first identifier, and the first identifier of the to-be-inspected video may be obtained simultaneously with the to-be-inspected video. The first identifier may be an id (Identity document, identification number) of the video to be inspected, for example, a generation order number or a task number of the video to be inspected, etc. The second identification may be an id of the constituent element, such as a number or the like.

And reversely checking a production database of the TTV production system through the id of the to-be-checked video to obtain a second identification of the constituent elements such as the text, the material and the like, thereby obtaining the mining library. TTV video production details such as splicing rules of the constituent materials can be further included in the mining library. The splicing rule of the constituent elements may be a rule for generating TTV video from each constituent material, and may include an arrangement order of a plurality of materials, a start time and an end time of each material on a time track, and the like, for example.

The materials table can also be obtained by checking the production database of the TTV production system. The material table may record the second identifier of each component element and the material information corresponding to each component element, where the material information may include information such as a composition form and a representation form of the component element, for example, whether the text of the ttv video has a voice broadcast, or information such as a font, a font size, and the like of the subtitle. The material information of the component element can be queried in the material table through the second identification of the component element, and analysis can be performed through the material information, so that the component element which can be used for detection is obtained.

As shown in fig. 7, according to the material information of the constituent elements recorded in the material table, the respective constituent elements, such as subtitles, materials, and the like, can be separated.

The first identification of the video to be detected can be used for determining the mining library of the video to be detected, and further, the second identification of the constituent elements recorded in the mining library and the material table are used for obtaining the constituent elements which can be used for detection, and the method is rapid and convenient.

In some embodiments, the constituent elements include a plurality of subtitles, and the detecting the constituent elements according to the preset detection rule in step S503 to obtain a detection result of the video to be detected includes: and detecting the subtitles according to a preset document organization rule to obtain a detection result of the video to be detected.

In this embodiment, the composition element may include a plurality of subtitles, and it may be understood that the mining library includes a splicing rule of the composition element, where the splicing rule of the composition element may record a splicing rule of a TTV video production process, and according to the rule, the text may be split to obtain a plurality of subtitles. The subtitles can be sequentially arranged on a time axis, and each picture or each section of picture can correspondingly display one subtitle.

The preset document organization rule may be a rule for detecting subtitles.

In some embodiments, the preset document organization rules may include a document beginning-to-end quality detection rule, or the like. For example, whether the beginning word and ending word of the subtitle meet the word forming logic is detected for each subtitle.

For example, the subtitle should be "weather-friendly today", but the subtitle is cut into "weather-friendly" when TTV video is produced, i.e., the subtitle does not conform to word-forming logic, and at this time the subtitle does not satisfy the preset document organization rule. And synthesizing the detection results of the subtitles to obtain the detection result of the video to be inspected.

The captions can be detected through a preset document organization rule, so that whether the video to be inspected is qualified or not is judged, and unqualified videos are screened out.

In some embodiments, detecting a plurality of subtitles according to a preset document organization rule to obtain a detection result of a video to be detected may include the following steps: determining a first total number of subtitles which do not accord with a preset document organization rule in a plurality of subtitles; determining the accuracy of a plurality of subtitles according to the first total number and the total number of subtitles of the video to be detected; and determining the video to be detected as the disqualified video under the condition that the accuracy rate is smaller than a preset accuracy rate threshold value.

It can be understood that, since the video to be inspected includes a plurality of subtitles, whether the video is qualified can be determined by determining the accuracy of the subtitles.

Accuracy T _p It can be calculated by means of the following formula:

the total detection number is the total number of subtitles, and the first total number is the number of subtitles which do not accord with the preset document organization rule.

The accuracy can be compared with a preset accuracy threshold, if the accuracy is smaller than the threshold, the video to be inspected is not qualified, otherwise, the video to be inspected is qualified. For example, when the preset document organization rule includes detecting the quality of the beginning and the end of the document, the accuracy of the subtitles can be calculated by the above formula, and assuming that the preset accuracy threshold is 99%, the total number of the subtitles is 100, which means that the quality of the beginning and the end of the 1 subtitle is allowed to be problematic at most for the 100 cut subtitles.

Because most of strategy accurate calling indexes which are depended on in the TTV production link are different, some requirements on accuracy are higher, and some requirements are not higher. Therefore, the video detection method can be suitable for various scenes requiring accuracy by setting the preset accuracy threshold.

In some embodiments, determining the first total number of subtitles in the plurality of subtitles that do not meet the preset document organization rule may include: determining the number of lines of the subtitles and the number of words of a single-line subtitle for each subtitle in the plurality of subtitles; and determining that the subtitle does not accord with the preset document organization rule under the condition that the line number of the subtitle is larger than a preset line number threshold or the word number of the single-line subtitle is larger than a preset single-line word number threshold, so as to obtain a first total number.

It is understood that the preset document organization rule may include a subtitle line number rule and a one-line word number rule. The caption line number rule may include that the line number of the caption does not exceed a preset line number threshold, and the single line number rule may include that the number of words of the single line caption does not exceed a preset single line number threshold.

In the detection, the number of single-line words of each subtitle and the number of lines of the subtitle can be circularly traversed to obtain the rationality of subtitle segmentation, for example, the maximum number of lines allowed to be split is 2 lines, the maximum number of single-line words is 30 words, namely, the preset number of lines is 2, and the preset number of single-line words is 30, and when the number of lines of the subtitle is greater than or equal to 3 (text.rowlength > =3) or the number of lines of the single-line subtitle is greater than or equal to 31 (text.wordsize > =31), the subtitle can be determined to be inconsistent with the preset document organization rule. By detecting a plurality of subtitles, a first total number can be obtained.

According to the embodiment, the number of lines of each subtitle and the number of words of a single-line subtitle are detected, namely, the subtitle is detected through two rules, the problems of subtitle super-screen and the like can be detected, the accuracy of a detection result can be further improved, unqualified videos can be timely found, and the user experience is improved.

In other embodiments, subtitle detection may use only a subtitle line number rule or a single line number rule, and may be specifically set according to practical situations.

In some embodiments, the constituent elements include a plurality of materials, each of the plurality of materials including a dynamic picture material or a static picture material; in step S503, detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected may include: and detecting the materials according to a preset material analysis rule to obtain a detection result of the video to be detected.

In this embodiment, the constituent elements may include a plurality of materials, and the materials may be dynamic picture materials, such as videos, moving pictures, or small videos. Alternatively, the material may also be still picture material, such as an image or the like. The subtitles may be arranged in sequence on the time axis.

The preset material analysis rule may be a rule for detecting a material. In some embodiments, preset story analysis rules detect dynamic story duty cycles and/or detect story repetition, etc.

The materials can be detected through a preset material analysis rule, so that whether the to-be-detected video is qualified or not is judged, and unqualified videos are screened out.

In some embodiments, detecting a plurality of materials according to a preset material analysis rule to obtain a detection result of a video to be detected may include: acquiring a material list, wherein the material list records positioning identifiers of a plurality of materials; for each material in a plurality of materials, searching a first positioning identifier of the material in a material list, wherein the first positioning identifier is the positioning identifier of the material; and under the condition that a plurality of positioning identifiers which are the same as the first positioning identifier exist in the material list, determining that the video to be detected is a disqualified video.

The material list can record the positioning identification of all materials, can also be obtained by inquiring the production database of the to-be-inspected video, and can be the mining library or part of the contents in the material list. The location identifier may be a uniform resource location address (uniform resource locator, url for short).

Through circulating and traversing each material, judging in a unique searching mode, wherein repeated positioning marks are arranged in a material list. For example, the location identifier of the material 1 is the same as the location identifier of the material 100, the location identifier of the material 1 is used as the first location identifier, and searching for the material list finds that the occurrence number of the first location identifier is 2, so that it is determined that the repeated material exists. And if the repeated materials exist in the to-be-detected video, determining that the to-be-detected video is unqualified.

Whether repeated materials exist in the material list or not can be judged, and the video to be detected can be detected, so that the quality of TTV video released on line is improved, and the use experience of a user is improved.

In some embodiments, detecting a plurality of materials according to a preset material analysis rule to obtain a detection result of a video to be detected may include: acquiring the number of dynamic picture materials in a plurality of materials; determining the dynamic material duty ratio of the video to be inspected according to the number of the dynamic picture materials and the total number of the materials in the video to be inspected; and under the condition that the duty ratio of the dynamic material is smaller than a preset duty ratio threshold value, determining the video to be detected as the disqualified video.

Since the material may include both dynamic picture material and static picture material, it is understood that the greater the number of dynamic picture material, the better the user experience and the higher the video quality. The dynamic material duty is determined by calculating the duty of the number of dynamic picture materials in the total number of materials. And comparing the duty ratio of the dynamic material with a preset duty ratio threshold, if the duty ratio is lower than the threshold, indicating that the content in the video to be detected is still picture material, and the user experience is poor and the video to be detected is unqualified. Otherwise, the video to be detected is qualified.

The display effect of the pictures in the video to be inspected can be judged through the dynamic material duty ratio, so that TTV video with more static picture materials is filtered, the quality of the generated video to be inspected is facilitated, and the user experience is improved.

In some embodiments, the detecting the constituent elements according to the preset detection rule in step S503 to obtain a detection result of the video to be detected may include: acquiring arrangement information of the constituent elements; and detecting the arrangement information according to a preset scene arrangement rule to obtain a detection result of the video to be detected.

The arrangement information of the constituent elements may include correlations between the constituent elements, which may include correspondence in time, association in content, and the like. The arrangement information can be obtained by mining the splicing rules of the constituent elements in the library.

The preset scene orchestration rules may include at least one of a blank subtitle detection rule, a node front-to-back alignment rule, and a document and material correspondence rule.

The text and material correspondence rule may include obtaining text and corresponding material, where the text may be composed of multiple subtitles, and by extracting content keywords of the text and content keywords of the material, comparing similarity of the content keywords and the content keywords to determine association degree of the text and the material, and further determining whether the text and the material correspond, if the text and the material do not correspond, the to-be-detected video is not qualified. The empty caption detection rule and the node front-rear alignment rule will be described in the following embodiments.

According to the embodiment, the arrangement information of the constituent elements can be detected through the preset scene arrangement rules, so that whether the video to be inspected is qualified or not is judged, and unqualified videos are screened out.

In some embodiments, obtaining the layout information of the constituent elements includes: the method comprises the steps of obtaining time length information of a plurality of subtitles in a video to be inspected, wherein the time length information of each subtitle in the plurality of subtitles comprises the full time length of the subtitle and the display time length of the subtitle.

Detecting the arrangement information according to a preset scene arrangement rule to obtain a detection result of the video to be detected, wherein the detection result comprises the following steps: for each caption in the plurality of captions, determining a difference value between the full time length of the caption and the display time length of the caption; under the condition that the difference value is larger than zero, determining the difference value as the blank caption duration of the caption, and determining the sum of the blank caption durations of a plurality of captions as the blank caption total duration of the video to be detected; and under the condition that the total time length of the empty captions is larger than the preset time length threshold value of the empty captions, determining the video to be inspected as the disqualified video.

This embodiment is an explanation of the subtitle detection rule. The arrangement information may include time length information of the respective subtitles. The time length information of the subtitles comprises the full time length of the subtitles and the display time length of the subtitles.

The display TIME length TEXT_TIME of the caption is the display TIME period of the caption in the to-be-inspected video, and the full TIME length MATE_TIME of the caption is the display TIME length of other corresponding constituent elements (such as materials or virtual persons) of the caption, namely the TIME length required by the caption to be full of the corresponding other constituent elements.

And (3) circulating all the subtitles, and calculating a difference value MATE_TIME-TEXT_TIME between the full TIME length of the subtitles and the display TIME length of the subtitles for each subtitle, wherein if the difference value is equal to 0, the situation that the subtitles are just full and no empty subtitle exists in the subtitles is indicated. If the difference is greater than 0, the display duration of the caption is insufficient to fully cover the corresponding other constituent elements, and the situation of empty caption exists.

After cycling all subtitles, the total blank subtitle duration sum (MATE_TIME-TEXT_TIME) of the video to be viewed can be obtained and then the preset blank subtitle duration threshold BADCASE_TIME of the blank material decision can be determined in combination with the on-line stability and user experience, i.e. if sum

(mate_time-text_time) > badase_time, determining that the video to be inspected belongs to a failed video.

By detecting the display duration of the captions and the full time of the captions, the condition of the empty captions can be detected, so that the problematic to-be-detected video is found, and the user experience is improved.

In some embodiments, obtaining the layout information of the constituent elements may include: and acquiring time information of a plurality of materials in the video to be inspected, wherein the time information of each material in the plurality of materials comprises a starting time and an ending time of the material.

The method comprises the steps of detecting the arrangement information according to a preset scene arrangement rule to obtain a detection result of the video to be detected, and the method can comprise the following steps one to three.

Step one, determining that the material does not accord with a preset scene arrangement rule under the condition that the starting time of the material is unequal to the ending time of a previous material aiming at each material in the plurality of materials, and further determining the second total number of the materials which do not accord with the preset scene arrangement rule in the plurality of materials, wherein the previous material is one material positioned in front of the material in the plurality of materials.

And step two, determining the alignment rate of the materials according to the second total number and the total number of the materials of the video to be detected.

And step three, determining that the video to be inspected is a disqualified video under the condition that the alignment rate is smaller than the alignment rate threshold value.

The present embodiment is an explanation of the alignment rule around the node. It will be appreciated that the rules for nodes in front of and behind may apply to subtitles and material. In this embodiment, materials are taken as examples to describe the same method for implementing subtitles, and no description is repeated.

The schedule information may include time duration information for the material. The duration information of the material may include a start time and an end time of the material in the time track.

And circularly traversing each material, and judging whether the starting time of the current material is equal to the ending time of the last material of the current material according to each current material, if so, indicating that the current material and the last material are aligned, otherwise, indicating that the current material and the last material are not aligned. The second total number of all non-aligned material is obtained by cycling through all material.

From the second total number and the total number of stories, an alignment rate of the plurality of stories may be determined. Specifically, the alignment rate T _r The following formula can be relied upon to calculate:

the total detection number is the total number of materials, and the second total number is the number of materials which do not accord with the preset scene arrangement rule.

And comparing the alignment rate with a preset alignment rate threshold value, if the alignment rate is smaller than the threshold value, indicating that the video to be inspected is unqualified, otherwise, indicating that the video to be inspected belongs to the qualified video.

Because most of strategy accurate calling indexes which are depended on in the TTV production link are different, some alignment rate requirements are higher, and some alignment rate requirements are not higher. Therefore, the video detection method can be suitable for various scenes with the alignment rate by setting the preset alignment rate threshold.

In some embodiments, the method further comprises: and under the condition that the video to be detected is the unqualified video, performing offline processing on the video to be detected.

It may be appreciated that the preset detection rule may include one or more of the foregoing embodiments, and if the video to be detected hits one of the rules, that is, is determined to be a failed video under any one of the rules, it indicates that the video is failed, and the video to be detected needs to be processed. Such as deletion and offline, etc., thereby improving the quality of the TTV video on-line.

In some embodiments, after the video to be detected is detected as a failed video, a manual review may be performed to determine whether the video is a recall, and if so, repair data may be performed on the mining code library of the video detection method, and then the mining code may be optimized. If the video is not called by mistake, the unqualified video can be processed.

In addition, code quality detection can be performed on a TTV production system by detecting a large number of to-be-detected videos, related code problems of subtitles and material strategies are judged, and code restoration is performed through feedback research and development.

In a specific embodiment, as shown in fig. 6, the overall concept of the video detection method is to split the constituent elements of TTV video production, including subtitles, audio, materials, virtual people, special effects, and the like. The detection covers all core production links including document organization, material analysis and scene arrangement. The detection policy is extended for various elements over the overlay policy.

As shown in fig. 7, the present embodiment provides a collecting and editing type video TTV detection system, which can implement a video detection method, that is, a process from video distribution to be detected to a specific TTV to detect a final detection result, and the working flow of the TTV detection system is as follows: (1) detecting excavation task generation: the distributed TTV videos serving as to-be-inspected videos are dropped into a detection library one by one, the database of the production module is reversely checked through a first Identification (ID) of the AIGC videos, relevant data of texts and materials are fished out, and the relevant data are stored into a mining library (2) for strategy production detection: and carrying out material analysis through a material table during TTV video production to obtain data of constituent elements, subtitles, materials and the like, and then requesting detection strategies in parallel, namely, simultaneously carrying out constituent element detection by utilizing various strategies such as subtitle (text) strategies, material strategies, arrangement strategies and the like. (3) case (video) processing: and (3) comparing the detection results under each strategy with a threshold value of low quality or related indexes to judge whether the TTV video is a disqualified video (badcase), and analyzing and processing a badcase labeling platform to offline serious case or feed back and promote strategy code upgrading and optimizing and the like.

The detection index may be of various types, for example directly compared with a threshold value. Taking the subtitle strategy as an example, the subtitle strategy (strategy T) can divide a large section of text into a plurality of text subtitles (short for subtitles), and if the number of the cut single text subtitle words is not more than 30 words, the text subtitle is a bad subtitle (bad case).

An accuracy index may be set for evaluating the video under inspection. Most of strategy accurate recall indexes which are relied on in the TTV video production link are different, and the accuracy of dividing the word number of a single sentence by the strategy T is assumed to reach 99%, which means that at most 1 document is allowed to exceed 30 words in the divided 100 documents. Policy accuracy T _p The decision rule may be described as follows:

the total number of detection is the total number of detected subtitles, badcase is the number of unqualified subtitles, M _t To preset the accuracy threshold, by combining T _p And M _t A comparison is made to determine if the TTV video is a bad video badcase.

If routine detection finds strategy T under iterative version _p Is lower than M _t Then the possible policy T may be problematic and require optimization。

The badcase determination process is described below from preset detection rules including a document, a material, and a scene arrangement rule, respectively.

(1) Text organization rule-subtitle segmentation strategy T

The split word count (text. Word size) and the split line count (line count text. Rowlength) of each subtitle are checked in a circulating way to obtain the rationality of subtitle splitting, for example, the maximum allowed split line count (preset line count threshold) is 2 lines, the maximum single line count (preset single line word count threshold) is 30 words, and then text. Rowlength > =3 or text. Word size > =31 are both the requirements of the split line count and the split word count are not met, and the subtitle is badcase. It may then be determined whether the TTV video is badcase by calculating the accuracy.

(2) Material analysis rules-Material repetition strategy P

And (3) circularly traversing and checking each material (picture or video material), judging in a unique mode, and determining that the repeated url (locating identification of the material) exists in the material list (material list). For example, url of the material 1 is the same as url of the material 100, and it is determined that there is a case of repeating the material. And if the repeated materials exist, judging the TTV video as badcase.

(3) Scene arrangement rules-null subtitle strategy R

And (3) circulating all the subtitles, comparing the display duration TEXT_TIME of each subtitle with the full-TIME MATE_TIME of the subtitle, wherein the empty material duration corresponding to the subtitle is the difference MATE_TIME-TEXT_TIME. If the difference value is 0, the caption is fully paved and no empty caption exists, and if the difference value is greater than 0, the caption displaying time is insufficient to fully paved the corresponding caption, and the empty caption can exist. And circulating all the subtitles and then obtaining the final blank subtitle total duration sum (MATE_TIME-TEXT_TIME), determining a blank subtitle judgment threshold value (preset blank subtitle duration threshold value) by combining the on-line stability and user experience, and if sum (MATE_TIME-TEXT_TIME) > blank_TIME, namely the video belongs to blank material.

Fig. 8 is a flowchart of a determination detection result of a video detection method according to an embodiment of the present disclosure. As shown in fig. 8, according to the above three strategies, whether the TTV video hits each strategy may be determined according to respective decision indexes, for example, the accuracy of the strategy is determined and then compared with respective thresholds. And then carrying out OR logic calculation on the result of each strategy, and summarizing to obtain whether the TTV video is badcase or not. If any of the above policies determines that the TTV video is a failed video, it is indicated that it belongs to badcase.

Fig. 9 is a flowchart of failed video processing of a video detection method provided according to an embodiment of the present disclosure. As shown in fig. 9, after determining that the TTV video is badcase, manual review and result processing may be performed, including the following steps S901 to S905.

Step S901, strategy review and result processing, wherein manual review is performed on the results of detection and video detection of various strategies of TTV.

Step S902, error recall determination and mining optimization are performed, and whether error recall exists is determined. If the call is missed, the code base repair data needs to be mined and then the mining code is optimized.

Step S903, case result processing, if no error call is made, processing is needed for the badcase video.

And step S904, deleting the video to be offline, and performing offline stop distribution on the badcase.

Step S905, repairing the relevant code, judging the related code problem of the caption and material strategy of the TTV production system for the badcase, and feeding back the research and development to repair the code.

The video detection method provided by the embodiment can provide a comprehensive and comprehensive TTV video detection method, namely, the badcase can be effectively found from the detection result, and meanwhile, the index (preset detection rule) result of the strategy accurate call for controlling a plurality of links can be better achieved, so that the larger influence caused by the strategy online is avoided. And the distributed problem video (badcase) can be recalled in time, so that the user experience of the online AIGC video is ensured, and meanwhile, the online code quality detection capability can be realized for the text, the material strategy and the video rendering code of the TTV production system.

Fig. 10 is a block diagram of a video detection apparatus provided according to an embodiment of the present disclosure; referring to fig. 10, an embodiment of the disclosure provides a video detection apparatus 1000, which includes the following units.

The obtaining unit 1001 is configured to obtain a to-be-inspected video.

The parsing unit 1002 is configured to parse the video to be inspected to obtain constituent elements of the video to be inspected.

The video detection unit 1003 is configured to detect the constituent elements according to a preset detection rule, and obtain a detection result of the video to be detected, where the detection result is used to reflect whether the video to be detected is a qualified video.

In some embodiments, parsing unit 1002 is further to: searching a production database and a material table of the video to be detected according to the first identification of the video to be detected to obtain an excavation library of the video to be detected, wherein the excavation library comprises the second identification of the constituent elements; searching a material table according to the second identification of the constituent elements to obtain material information of the constituent elements; and analyzing the video to be inspected according to the material information to obtain the constituent elements of the video to be inspected.

In some embodiments, the constituent elements include a plurality of subtitles, and the video detection unit 1003 is further configured to: and detecting the subtitles according to a preset document organization rule to obtain a detection result of the video to be detected.

In some embodiments, the video detection unit 1003 is further configured to: determining a first total number of subtitles which do not accord with a preset document organization rule in a plurality of subtitles; determining the accuracy of a plurality of subtitles according to the first total number and the total number of subtitles of the video to be detected; and under the condition that the accuracy rate is smaller than a preset accuracy rate threshold value, determining the video to be detected as the disqualified video.

In some embodiments, the video detection unit 1003 is further configured to: determining the number of lines of the subtitles and the number of words of a single-line subtitle for each subtitle in the plurality of subtitles; and determining that the subtitle does not accord with the preset document organization rule under the condition that the line number of the subtitle is larger than a preset line number threshold or the word number of the single-line subtitle is larger than a preset single-line word number threshold, so as to obtain a first total number.

In some embodiments, the constituent elements include a plurality of materials, each of the plurality of materials including a dynamic picture material or a static picture material; the video detection unit 1003 is also configured to: and detecting the materials according to a preset material analysis rule to obtain a detection result of the video to be detected.

In some embodiments, the video detection unit 1003 is further configured to: acquiring a material list, wherein the material list records positioning identifiers of a plurality of materials; for each material in a plurality of materials, searching a first positioning identifier of the material in a material list, wherein the first positioning identifier is the positioning identifier of the material; and under the condition that a plurality of positioning identifiers which are the same as the first positioning identifier exist in the material list, determining that the video to be detected is a disqualified video.

In some embodiments, the video detection unit 1003 is further configured to: acquiring the number of dynamic picture materials in a plurality of materials; determining the dynamic material duty ratio of the video to be inspected according to the number of the dynamic picture materials and the total number of the materials in the video to be inspected; and under the condition that the duty ratio of the dynamic material is smaller than a preset duty ratio threshold value, determining the video to be detected as the disqualified video.

In some embodiments, the video detection unit 1003 is further configured to: acquiring arrangement information of the constituent elements; and detecting the arrangement information according to a preset scene arrangement rule to obtain a detection result of the video to be detected.

In some embodiments, the video detection unit 1003 is further configured to: acquiring time length information of a plurality of subtitles in a video to be inspected, wherein the time length information of each subtitle in the plurality of subtitles comprises the full time length of the subtitle and the display time length of the subtitle; for each caption in the plurality of captions, determining a difference value between the full time length of the caption and the display time length of the caption; under the condition that the difference value is larger than zero, determining the difference value as the blank caption duration of the caption, and determining the sum of the blank caption durations of a plurality of captions as the blank caption total duration of the video to be detected; and under the condition that the total time length of the empty captions is larger than the preset time length threshold value of the empty captions, determining the video to be inspected as the disqualified video.

In some embodiments, the video detection unit 1003 is further configured to: acquiring time information of a plurality of materials in a video to be inspected, wherein the time information of each material in the plurality of materials comprises a starting time and an ending time of the material; for each material in the plurality of materials, under the condition that the starting time of the material is unequal to the ending time of the previous material, determining that the material does not accord with the preset scene arrangement rule, and further determining the second total number of the materials which do not accord with the preset scene arrangement rule in the plurality of materials, wherein the previous material is one material positioned in front of the material in the plurality of materials; determining the alignment rate of a plurality of materials according to the second total number and the total number of materials of the video to be detected; and under the condition that the alignment rate is smaller than the alignment rate threshold value, determining the video to be detected as the disqualified video.

In some embodiments, the apparatus 1000 further comprises: and the video processing unit is used for carrying out offline processing on the video to be detected under the condition that the video to be detected is the unqualified video.

For descriptions of specific functions and examples of each module and sub-module of the apparatus in the embodiments of the present disclosure, reference may be made to the related descriptions of corresponding steps in the foregoing method embodiments, which are not repeated herein.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

The embodiment of the disclosure also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments described above.

The disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the above embodiments.

The disclosed embodiments also provide a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments described above.

Fig. 11 illustrates a schematic block diagram of an example electronic device 1100 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

Various components in device 1100 are connected to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the respective methods and processes described above, such as a video detection method. For example, in some embodiments, the video detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When a computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the video detection method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the video detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, which are executable and +_ on a programmable system including at least one programmable processor

Or interpreted, the programmable processor can be a special purpose or general-purpose programmable processor that can receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. that are within the principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A video detection method, comprising:

obtaining a to-be-detected video;

analyzing the to-be-detected video to obtain constituent elements of the to-be-detected video;

and detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, wherein the detection result is used for reflecting whether the video to be detected is a qualified video.

2. The method of claim 1, wherein,

analyzing the to-be-detected video to obtain constituent elements of the to-be-detected video, wherein the method comprises the following steps:

Searching a production database of the video to be detected according to the first identification of the video to be detected to obtain an excavation library and a material table of the video to be detected, wherein the excavation library comprises the second identification of the constituent elements;

searching the material table according to the second identification of the constituent elements to obtain material information of the constituent elements;

and analyzing the video to be inspected according to the material information to obtain the constituent elements of the video to be inspected.

3. The method according to claim 1 or 2, wherein the constituent elements include a plurality of subtitles, the constituent elements are detected according to a preset detection rule, and a detection result of the video to be detected is obtained, including:

and detecting the subtitles according to a preset document organization rule to obtain a detection result of the video to be detected.

4. The method of claim 3, wherein,

detecting the plurality of subtitles according to a preset document organization rule to obtain a detection result of the video to be detected, wherein the detection result comprises the following steps:

determining a first total number of subtitles which do not accord with a preset document organization rule in the plurality of subtitles;

determining the accuracy of the plurality of subtitles according to the first total number and the total number of the subtitles of the video to be detected;

And under the condition that the accuracy rate is smaller than a preset accuracy rate threshold value, determining that the video to be detected is an unqualified video.

5. The method of claim 4, wherein the determining a first total number of subtitles in the plurality of subtitles that do not meet a preset document organization rule comprises:

determining the number of lines of the subtitles and the number of words of a single-line subtitle for each subtitle in the plurality of subtitles;

and determining that the subtitle does not accord with the preset document organization rule under the condition that the number of lines of the subtitle is larger than a preset line number threshold or the number of words of the single-line subtitle is larger than a preset single-line word number threshold, so as to obtain the first total number.

6. The method of any of claims 1-5, wherein the constituent element comprises a plurality of materials, each of the plurality of materials comprising a dynamic picture material or a static picture material;

detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, wherein the detection result comprises the following steps:

and detecting the materials according to a preset material analysis rule to obtain a detection result of the video to be detected.

7. The method of claim 6, wherein detecting the plurality of stories according to a preset story analysis rule to obtain a detection result of the video to be detected, comprises:

Acquiring a material list, wherein the material list records positioning identifiers of the materials;

for each material in the plurality of materials, searching a first positioning identifier of the material in the material list, wherein the first positioning identifier is the positioning identifier of the material;

and under the condition that a plurality of positioning identifiers which are the same as the first positioning identifier exist in the material list, determining that the video to be detected is a disqualified video.

8. The method of claim 6, wherein detecting the plurality of stories according to a preset story analysis rule to obtain a detection result of the video to be detected, comprises:

acquiring the number of dynamic picture materials in the plurality of materials;

determining the dynamic material ratio of the video to be detected according to the number of the dynamic picture materials and the total number of the materials in the video to be detected;

and under the condition that the duty ratio of the dynamic material is smaller than a preset duty ratio threshold, determining that the video to be detected is a disqualified video.

9. The method according to any one of claims 1-8, wherein detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected includes:

Acquiring the arrangement information of the constituent elements;

and detecting the arrangement information according to a preset scene arrangement rule to obtain a detection result of the video to be detected.

10. The method of claim 9, wherein obtaining the composition element's arrangement information comprises:

acquiring time length information of a plurality of subtitles in the video to be detected, wherein the time length information of each subtitle in the plurality of subtitles comprises the full time length of the subtitle and the display time length of the subtitle;

detecting the arrangement information according to a preset scene arrangement rule to obtain a detection result of the video to be detected, wherein the detection result comprises the following steps:

for each caption in the plurality of captions, determining a difference value between the full time length of the caption and the display time length of the caption;

under the condition that the difference value is larger than zero, determining the difference value as the blank caption duration of the caption, and determining the sum of the blank caption durations of the plurality of captions as the blank caption total duration of the video to be detected;

and under the condition that the total time length of the empty captions is larger than the preset time length threshold value of the empty captions, determining that the video to be detected is a disqualified video.

11. The method of claim 9, wherein obtaining the composition element's arrangement information comprises:

Acquiring time information of a plurality of materials in the video to be detected, wherein the time information of each material in the plurality of materials comprises a starting time and an ending time of the material;

for each material in the plurality of materials, under the condition that the starting time of the material is not equal to the ending time of the previous material, determining that the material does not accord with a preset scene arrangement rule, and further determining a second total number of materials which do not accord with the preset scene arrangement rule in the plurality of materials, wherein the previous material is one material positioned in front of the material in the plurality of materials;

determining the alignment rate of the materials according to the second total number and the total number of the materials of the video to be detected;

and under the condition that the alignment rate is smaller than an alignment rate threshold value, determining the video to be detected as a disqualified video.

12. The method of any of claims 1-11, further comprising:

and under the condition that the video to be detected is an unqualified video, performing offline processing on the video to be detected.

13. A video detection apparatus comprising:

the acquisition unit is used for acquiring the to-be-detected video;

the analysis unit is used for analyzing the video to be detected to obtain the constituent elements of the video to be detected;

the video detection unit is used for detecting the constituent elements according to a preset detection rule to obtain a detection result of the video to be detected, and the detection result is used for reflecting whether the video to be detected is a qualified video or not.

14. The apparatus of claim 13, wherein the parsing unit is further configured to:

15. The apparatus of claim 13 or 14, wherein the constituent elements comprise a plurality of subtitles, the video detection unit further configured to:

16. The apparatus of claim 15, wherein the video detection unit is further configured to:

17. The apparatus of claim 16, wherein the video detection unit is further configured to:

18. The apparatus of any of claims 13-17, wherein the constituent element comprises a plurality of materials, each of the plurality of materials comprising a dynamic picture material or a static picture material;

the video detection unit is further configured to: and detecting the materials according to a preset material analysis rule to obtain a detection result of the video to be detected.

19. The apparatus of claim 18, wherein the video detection unit is further configured to:

20. The apparatus of claim 18, wherein the video detection unit is further configured to:

21. The apparatus of any of claims 13-20, wherein the video detection unit is further to:

acquiring the arrangement information of the constituent elements;

22. The apparatus of claim 21, wherein the video detection unit is further configured to:

23. The apparatus of claim 21, wherein the video detection unit is further configured to:

24. The apparatus of claim 23, further comprising:

and the video processing unit is used for carrying out offline processing on the video to be inspected under the condition that the video to be inspected is an unqualified video.

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.

26. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-12.

27. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-12.