CN111767838A

CN111767838A - Video auditing method and system, computer system and computer-readable storage medium

Info

Publication number: CN111767838A
Application number: CN202010601923.8A
Authority: CN
Inventors: 孙斌; 焦大原; 刘亚萍
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-10-13

Abstract

The present disclosure relates to a video auditing method and system, a computer system, and a computer-readable storage medium, and relates to the field of deep learning and image processing. The video auditing method comprises the following steps: determining one or more video tags and confidence scores thereof for the video material; determining one or more frame tags and a confidence score thereof for at least one frame of a plurality of frames of video material, wherein each frame of the at least one frame has at least one frame tag; and performing machine auditing on the video material according to the one or more video tags and the confidence scores thereof and the one or more frame tags and the confidence scores thereof.

Description

Video auditing method and system, computer system and computer-readable storage medium

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a video auditing method and system, a computer system, and a computer-readable storage medium.

Background

Nowadays, video materials on the internet are increasing, and most of them appear in various forms such as advertisement, video log (Vlog), short video, etc. With the rapid development of the fifth generation (5G) and sixth generation (6G) communication technologies and the internet of things, the number of video materials will increase exponentially, and various problems such as various illegal sensitive information, trademark infringement, user experience and the like will appear in the video materials.

The current video auditing method mainly adopts the following three modes:

(1) and (6) carrying out all manual audits. The auditor audits the video materials based on the modes of randomly extracted video frame display, video playing, video speed playing and the like, and the auditing timeliness depends on the amount of manpower and the convenience of the auditing platform.

(2) And (6) checking all machines. Some product lines all adopt a machine auditing strategy to audit video materials, the video materials only have rejection and passing states, and no manual auditing intervention link exists.

(3) And machine auditing and manual auditing are combined. And auditing key frames of the video material by combining deep learning methods such as Optical Character Recognition (OCR), face recognition, target detection and the like and machine auditing strategies in a frame dimension, and further performing manual review on the video material under the condition that the video material passes through machine auditing.

Disclosure of Invention

According to a first aspect of the present disclosure, an embodiment of the present disclosure provides a video auditing method, including: determining one or more video tags and confidence scores thereof for the video material; determining one or more frame tags and a confidence score thereof for at least one frame of a plurality of frames of video material, wherein each frame of the at least one frame has at least one frame tag; and performing machine auditing on the video material according to the one or more video tags and the confidence scores thereof and the one or more frame tags and the confidence scores thereof.

According to a second aspect of the present disclosure, an embodiment of the present disclosure provides a video auditing system, including: a first determination unit configured to determine one or more video tags of video material and confidence scores thereof; a second determination unit configured to determine one or more frame tags of at least one frame of a plurality of frames of the video material and a confidence score thereof, wherein each frame of the at least one frame has at least one frame tag; and the auditing execution unit is configured to conduct machine auditing on the video material according to the one or more video tags and the confidence scores thereof and the one or more frame tags and the confidence scores thereof.

According to a third aspect of the present disclosure, an embodiment of the present disclosure provides a computer system, including: one or more processors; and a storage device having stored thereon a computer program that, when executed by one or more processors, causes the one or more processors to implement a video review method as described above.

According to a fourth aspect of the present disclosure, an embodiment of the present disclosure provides a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, causes the processor to implement the video auditing method as described above.

In one or more embodiments according to the present disclosure, at least one of an accuracy and a recall rate of a video review can be improved by using one or more video tags of video material in combination with one or more frame tags of at least one of a plurality of frames of video material to review the video material.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 is a flow diagram illustrating a video review method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating an example tag architecture used in a video review method according to an embodiment of the disclosure;

fig. 3 is a flowchart showing a specific process of step S106 shown in fig. 2;

FIG. 4 is a schematic diagram illustrating an example video queue for manual review according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a data reflow mechanism based on manual review results according to an embodiment of the disclosure;

FIG. 6 is a block diagram illustrating a video review system according to an embodiment of the present disclosure;

FIG. 7 is a block diagram illustrating an exemplary computer system that can be used in example embodiments.

Detailed Description

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, the first element and the second element may be combined in the same unit or may be further divided into a plurality of units.

The terminology used in the description of the various embodiments in the present disclosure is for the purpose of describing particular examples only and is not intended to be limiting. The embodiments of the present disclosure and the features of the embodiments may be combined with each other without conflict. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone.

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. In addition, it should be noted that, for convenience of description, only portions related to the related invention are shown in the drawings.

The video auditing method and system are provided for solving the problems of high labor cost, low auditing efficiency, low accuracy and recall rate and the like of the existing video auditing method. The following describes a video auditing method and system according to an embodiment of the present disclosure in detail with reference to the accompanying drawings.

Fig. 1 is a flow diagram illustrating a video review method 100 according to an embodiment of the disclosure. As shown in fig. 1, the video review method 100 includes: step S102, determining one or more video tags and confidence scores thereof of the video material; step S104, determining one or more frame tags and confidence scores thereof of at least one frame in a plurality of frames of the video material, wherein each frame in the at least one frame has at least one frame tag; and step S106, performing machine auditing on the video material according to one or more video tags and confidence scores thereof of the video material and one or more frame tags and confidence scores thereof of at least one frame in the video material.

Here, the at least one frame of the plurality of frames of the video material used for the video review may be at least one key frame in the video material. When the video material passes through the machine audit, the video material can be further audited manually, and the subsequent application of the video material can be directly allowed. When the video material does not pass the machine audit, the subsequent application of the video material can be directly rejected without further manual audit on the video material.

According to the video auditing method disclosed by the embodiment of the disclosure, the video tag of the video material and the frame tag of at least one frame in a plurality of frames of the video material are combined for auditing the video material, so that at least one of the accuracy rate and the recall rate of machine auditing can be improved compared with a video auditing method only using the frame tag. Because at least one of the accuracy and recall of machine review can be improved, the dependence on manual review is reduced, and thus labor cost can be potentially saved to improve review efficiency.

In some embodiments, video data may be collected on the network according to business needs, and classification models trained in advance on public data sets such as Kinetics, UCF101, SomethingV1& V2 may be trained by using the collected video data, so as to obtain a video classification model suitable for video auditing. With the development of deep learning, one or more technologies of a Convolutional Neural Network (CNN), a two-way long-short memory network (LSTM), a gate control cyclic unit network (GRU), a three-dimensional convolutional neural network (3D-CNN), a dual-stream method, and a Time Shift Module (TSM) may be applied to a video classification model to perform video classification in consideration of both RGB frame information of a video material and interframe correlation information (e.g., interframe optical flow information) of the video material, thereby improving accuracy and speed of video classification. In addition, a plurality of labels can be determined for each video material, the relevance among the labels is modeled by adopting a neural network, and the learning generalization capability of the video classification model can be improved by adding the relevance model into the video classification model.

In some embodiments, video material may be classified using a video classification model that may classify the video material taking into account interframe relevance information of the video material to determine one or more video tags of the video material and their confidence scores. In some cases, the video classification model may further classify the video material in conjunction with audio information contained in the video material.

In some embodiments, each of the one or more video tags of the video feed may belong to a video tag class, and determining the one or more video tags of the video feed and their confidence scores may include: for each of a plurality of video tag classes, a video tag of a video material belonging to the video tag class is selected from a plurality of candidate video tags included in the video tag class, and a confidence score for the video tag is determined.

Fig. 2 is a schematic diagram illustrating an example tag architecture used in a video review method according to an embodiment of the disclosure. As shown in fig. 2, a plurality of video tag classes including industry, scene, category, quality, etc. may be set; the industry video tags can comprise industry tags such as commercial recruitment, moving logistics and the like; scene video tags may include scene tags for bedrooms, beaches, etc.; the category class video tags may include category tags for games, selfie, etc. (wherein the category tags for the "selfie" class may be further subdivided into category tags for chest trembling, hip twisting, etc.). In addition, as shown in fig. 2, a plurality of categories of frame tags including Optical Character Recognition (OCR), face recognition, logo (logo) detection, sharpness scoring, watermark recognition, and the like may be set. In practice, a label system with the coverage as wide as possible and the layering granularity as fine as possible can be designed according to actual business needs.

Fig. 3 is a flowchart showing a specific process of step S106 shown in fig. 2. As shown in fig. 3, in some embodiments, step S106 may include: step S1062, video dimension audit is carried out on the video material according to one or more video tags of the video material and confidence scores of the video tags; step S1064, when the video material passes the video dimension audit, performing the frame dimension audit on the video material according to one or more frame tags of at least one frame in the video material and the confidence scores thereof; and step S1066, rejecting the subsequent application of the video material under the condition that the video material does not pass the video dimension audit. A higher confidence score for a frame tag means a higher probability of the frame being at risk. For example, a frame label "OCR" has a higher confidence score, meaning that the frame has a higher probability of containing sensitive or inaudible text; the frame label "logo (logo) detection" has a higher confidence score, meaning that the probability of a frame containing a well-known brand logo (logo) and thus potentially infringing content is higher.

Here, since the subsequent application of the video material is directly rejected without the video material passing the video dimension audit, and further frame dimension audit on the video material is not required, the audit efficiency can be further improved.

It is to be understood that the one or more video tags of the video feed may include video tags that represent risk as well as video tags that do not represent risk; the higher the confidence score of the video tag representing the risk, the higher the likelihood that the video feed contains the risk content; the higher the confidence score of a video tag that does not represent a risk, the higher the likelihood that the video feed does not contain risk content.

In some embodiments, performing a video dimension audit on the video material may include: and judging whether the confidence scores of the video tags representing risks in one or more video tags of the video material are all smaller than corresponding threshold values, if not, judging that the video material does not pass the video dimension audit and rejecting the subsequent application of the video material, and if so, judging that the video material passes the video dimension audit. For example, if the video material has a "chest tremble" video tag and the confidence score of the "chest tremble" video tag is 0.9, which is greater than the threshold of 0.6, then it is determined that the video material has not passed the video dimension audit and subsequent applications of the video material are rejected.

In some embodiments, performing a video dimension audit on the video material may include: calculating at least one weighted sum of confidence scores for one or more video tags of the video material; and judging whether the calculated at least one weighted sum is smaller than a corresponding threshold value, if not, judging that the video material does not pass the video dimension audit and rejecting the subsequent application of the video material, and if so, judging that the video material passes the video dimension audit.

For example, one or more video tags of the video material may be divided into high, medium, and low risk video tag groups, a first weight value is assigned to the video tags in the high risk video tag group, a second weight value is assigned to the video tags in the medium risk video tag group, a third weight value is assigned to the video tags in the low risk video tag group, a weighted sum of the confidence scores of the video tags in each risk tag group is calculated, and in the case that the weighted sum of the confidence scores of the video tags in each risk tag group is less than the threshold set for each of the risk tag groups, the video material is determined to pass the video dimension audit, otherwise the video material is determined not to pass the video dimension audit and subsequent application of the video material is rejected. Here, the first weight value is greater than the second weight value, which is greater than the third weight value. The third weight value may be set very low or even negative. For example, if the video material has a video tag of "hip-up" (which belongs to the high-risk tag group and has a first weight value of 0.8) and a video tag of "concert" (which belongs to the low-risk tag group and has a weight value of-0.3), the confidence score of the video tag of "hip-up" is 0.8 and the confidence score of the video tag of "concert" is 0.9, the weighted sum is 0.37, which is less than the threshold value of 0.6, and the video material is determined to pass the video dimension audit. In other words, calculating such a weighted sum may take into account the associations between multiple video tags to determine whether they should be reviewed by the video dimension.

Similarly, the one or more frame tags for each frame in the video feed may include frame tags that represent risk or frame tags that do not represent risk; the higher the confidence score of a frame tag representing a risk, the higher the likelihood that the frame contains risky content; the higher the confidence score of a frame tag that does not represent a risk, the higher the likelihood that the frame does not contain risky content. In addition, the same frame tag may represent different risks when associated with different video tags. For example, the frame label "swimsuit" represents a high risk when associated with the video label "indoor" of the scene class and a low risk when associated with the video label "beach" of the scene class.

In some embodiments, performing a frame dimension audit on the video material may include: determining whether confidence scores of one or more frame tags of at least one frame in the video material are all less than a respective threshold, and if not, determining that the video material has not passed the frame dimension review and rejecting subsequent applications of the video material, wherein the respective threshold is determined based on at least one of the one or more video tags of the video material and the confidence scores of the one or more video tags. Here, if the confidence scores of one or more frame tags of at least one frame in the video material are all less than the corresponding threshold, then it is determined that the video material passes the frame dimension audit. For example, in the previous example, the confidence score threshold for the frame tag "swimsuit" may be set lower if the video material has the video tag "indoors" and the confidence score is higher, and the confidence score threshold for the frame tag "swimsuit" may be set higher if the video material has the video tag "beach" and the confidence score is higher. This is because on a beach, even if swimsuits are present in the video material, the overall probability of the video material being at risk is relatively small.

Alternatively, in some embodiments, performing a frame dimension audit on the video material may include: calculating at least one weighted sum of confidence scores of one or more video tags of the video feed and confidence scores of one or more frame tags of at least one frame in the video feed; and judging whether the at least one weighted sum is smaller than a corresponding threshold value, if not, judging that the video material does not pass the frame dimension audit and rejecting the subsequent application of the video material, and if so, judging that the video material passes the frame dimension audit. For example, if the video material has a video tag of "swimsuit" (which belongs to the high-risk tag group and has a first weight value of 0.8) and a video tag of "beach" (which belongs to the low-risk tag group and has a weight value of-0.1), the confidence score of the video tag of "swimsuit" is 0.8 and the confidence score of the video tag of "beach" is 0.9, the weighted sum is 0.55, which is less than the threshold value of 0.6, the video material is determined to pass the frame dimension audit. In other words, calculating such a weighted sum may also take into account the association between the video tag and the frame tag to determine whether a frame dimension audit should be passed.

In some embodiments, step S1064 may include: performing first frame dimension audit on the video material; and performing second frame dimension audit on the video material. Here, in the case where the video material passes the first frame dimension audit and the second frame dimension audit, subsequent application of the video material is allowed; rejecting subsequent application of the video material when the video material does not pass the first frame dimension audit; and under the condition that the video material passes the first frame dimension audit and does not pass the second frame dimension audit, the machine audit indicates that the manual audit needs to be continued.

Here, since the subsequent application of the video material is directly allowed without further manual review of the video material when the video material passes through the first frame dimension review and the second frame dimension review, the review efficiency can be further improved, the manual review burden can be reduced, and the labor cost can be reduced.

In some embodiments, the second frame dimension audit may employ a stricter audit standard than the first frame dimension audit. Therefore, the first frame dimension audit and the second frame dimension audit can be executed according to the sequence, and when the video material does not pass the first frame dimension audit, the subsequent application of the video material can be rejected without performing the second frame dimension audit on the video material. Therefore, the auditing efficiency can be further improved, the manual auditing burden is reduced, and the labor cost is reduced.

In some embodiments, performing a first frame dimension audit on the video material may include determining whether confidence scores of one or more frame tags of at least one frame in the video material are all less than respective ones of one or more first thresholds, and if so, determining that the video material passes the first frame dimension audit, otherwise, determining that the video material does not pass the first frame dimension audit. Performing second frame dimension audit on the video material comprises judging whether the confidence scores of one or more frame tags of at least one frame in the video material are all smaller than a corresponding second threshold value in the one or more second threshold values, if so, judging that the video material passes the second frame dimension audit, otherwise, judging that the video material does not pass the second frame dimension audit. For each frame tag, the first threshold is greater than the second threshold.

In order to ensure that the risky video materials are filtered out through video auditing and the risky video materials are prevented from being pushed to a user in subsequent application (namely, risk information is prevented from leaking), after the video materials pass through machine auditing, the video materials can be further audited manually.

In some embodiments, the video review method 100 may include: and S108, under the condition that the machine audit indicates that the video material needs to be continuously manually audited, adding the video material into a video queue for manual audit according to the confidence scores of one or more video tags of the video material and the confidence scores of one or more frame tags of at least one frame in the video material.

In some embodiments, step S108 may include: determining a video priority weighting factor for the video material based on the confidence scores of the one or more video tags of the video material and the confidence scores of the one or more frame tags of the at least one frame of the video material; and adding the video material into one of a plurality of video queues for manual review with different priorities according to the video priority weight factor of the video material, wherein the plurality of video queues for manual review with different priorities push the video material stored therein for manual review according to a preset proportion. Therefore, high-risk video materials (namely, video materials with high video priority weighting factors) can be preferentially pushed to manual review, and low-risk video materials can be prevented from being retained in video queues with corresponding priorities for a long time and being incapable of entering a manual review link.

In some embodiments, determining a video priority weighting factor for video material may comprise: a weighted sum of the confidence scores of the one or more video tags of the video feed and the confidence scores of the one or more frame tags of the at least one frame of the video feed is calculated as a video priority weighting factor for the video feed.

For example, the video category priority factor of the video material may be determined by weighted averaging of confidence scores of a plurality of category-class video tags of the video material, the frame tag priority factor of the video material may be determined by normalizing and weighted averaging of confidence scores of a plurality of frame tags of a plurality of key frames of the video material, and then the video category priority factor and the frame tag priority factor of the video material may be determined by weighting the video category priority factor and the frame tag priority factor of the video material by different weights.

The priority weighting factor Q for calculating video material is given below_weightAn example formula of (c):

wherein, λ is the proportion of the video category priority factor in the video priority weight factor, n_videoIs the number of video tags used to determine the video category priority factor; mu.s_iThe preset category priority factors (such as self-timer 0.6, game 0.2, medical 0.8, and the like) corresponding to the video tags of each video tag category reflect the risk level of different categories of videos,the method comprises the following steps of (1) prior knowledge belonging to video audit; n is_frameTo determine the number of frame tags for the frame tag priority factor,_ithe confidence scores of the frame tags can be taken as the maximum value of the confidence scores of a certain type of frame tags of all the selected frames (for example, all the key frames), and the maximum value of the certain type of frame tags in the whole key frame cluster is selected to be beneficial to maximizing the risk information and reducing the leakage probability of the risk;_{i_max}and_{i_min}respectively the upper and lower bounds of the confidence score interval of the frame label. Here, the number of video tags used to determine the video category priority factor and the number of frame tags used to determine the frame tag priority factor may be determined according to traffic needs. The normalized summation of the confidence scores of the frame tags in the above formula may also be replaced with other weighted summations of the confidence scores of the frame tags, according to some embodiments.

Fig. 4 is a schematic diagram illustrating an example video queue for manual review according to an embodiment of the present disclosure. As shown in fig. 4, there are three different priority video queues for manual review, high, medium and low. Theta_h、θ_m、θ_nDistribution threshold, rho, for video queues of respective priority_h、ρ_m、ρ_nAnd merging the video queues with the priorities into the final video pushing queue. Can be obtained by weighting the video priority of the video material by a factor Q_weightAnd theta_h、θ_m、θ_nAnd comparing, and adding the video material into one of the video queues with the high priority, the medium priority and the low priority. In general, the number of priority video queues and the proportion of each priority video queue in the final video push queue can be determined according to the service requirement.

In some embodiments, to improve review efficiency, before performing step S102, the video review method 100 may further include: and comparing the unique identifier and the video characteristics of the video material with a plurality of unique identifiers and a plurality of video characteristics stored in a video white library and a video black library to perform pre-audit on the video material. In the event that the same unique identifier and video features as the video feed are present in the video white library, subsequent application of the video feed is allowed. In the event that the same unique identifier and video characteristics as the video feed exist in the video black library, subsequent application of the video feed is denied. And under the condition that the video white library and the video black library do not have the same unique identifier and video characteristics as the video material, performing machine audit on the video material according to one or more video tags and confidence scores thereof of the video material and one or more frame tags and confidence scores thereof of at least one frame of the video material.

In addition, a data reflow mechanism based on the result of manual review is also provided. FIG. 5 is a schematic diagram illustrating a data reflow mechanism based on manual review results according to an embodiment of the disclosure. As shown in fig. 5, in the case that the video material passes the manual review, the unique identifier and the video characteristics of the video material are added to the video white library; in the case that the video material does not pass the manual review, the unique identifier and the video features of the video material are added into a video black library, and relevant information of the video material, such as a storage address of the video material, whether the video material should be rejected, a video tag of the video material representing a risk, time information and a frame tag of a frame having a frame tag representing a risk in the video material, and the like, can be stored in a sample library for model iteration and accuracy-recall dynamic testing of video dimension and frame dimension machine review. Here, the unique identifier of the video material may be, for example, an MD5 unique identifier of the video material, and the video feature of the video material may be, for example, an integral video retrieval feature formed by fusing a video feature and an audio feature of the video material.

To implement the above-described data reflow mechanism based on manual review results, in some embodiments, the video review method 100 may further include: executing manual review under the condition that the machine review indicates that the video material needs to be continuously manually reviewed; and adding the unique identifier and the video characteristics of the video material into a video white library under the condition that the video material passes the manual review, and adding the unique identifier and the video characteristics of the video material into a video black library under the condition that the video material does not pass the manual review.

In some embodiments, the video review method 100 may further include: executing manual review under the condition that the machine review indicates that the video material needs to be continuously manually reviewed; and in the event that the video material fails the manual review, storing relevant information of the video material in a sample repository for at least one of: model iteration of video dimension machine audit, model iteration of frame dimension machine audit, and accuracy-recall dynamic test. Here, the related information of the video material may include at least one of: the storage address of the video material, whether the video material should be rejected, a video tag of the video material representing a risk, time information of a frame having a frame tag representing a risk in the video material, and a frame tag.

By using the video auditing method according to the embodiment of the disclosure, the auditing accuracy and recall rate can be improved, the labor cost is saved, and the auditing efficiency is high.

FIG. 6 is a block diagram illustrating a video review system 200 according to an embodiment of the disclosure. The video review system 200 is described in detail below in conjunction with FIG. 5.

As shown in fig. 6, the video review system 200 may include a first determination unit 202, a second determination unit 204, and a review execution unit 206. The first determination unit 202 is configured to determine one or more video tags of the video material and a confidence score thereof. The second determining unit 204 is configured to determine one or more frame tags of at least one frame in the video material and a confidence score thereof, wherein each frame of the at least one frame has at least one frame tag. The audit execution unit 206 is configured to perform a machine audit of the video material based on the one or more video tags and their confidence scores and the one or more frame tags and their confidence scores.

In some embodiments, the video review system 200 may further include: the video pushing unit 208 is configured to, in a case that the machine review indicates that the manual review of the video material is required to be continued, add the video material to a video queue for manual review according to the confidence scores of the one or more video tags of the video material and the confidence scores of the one or more frame tags of the at least one frame of the video material.

In some embodiments, the audit execution unit 206 may be further configured to perform one or more of the following processes: adding the unique identifier and the video characteristics of the video material into a video white library under the condition that the video material passes manual review, and adding the unique identifier and the video characteristics of the video material into a video black library under the condition that the video material does not pass manual review; the method comprises the steps of comparing a unique identifier and video features of a video material with a plurality of unique identifiers and a plurality of video features stored in a video white library and a video black library, and conducting pre-auditing on the video material, wherein the subsequent application of the video material is allowed under the condition that the unique identifier and the video features which are the same as those of the video material exist in the video white library, the subsequent application of the video material is rejected under the condition that the unique identifier and the video features which are the same as those of the video material exist in the video black library, and the machine auditing is conducted on the video material according to one or a plurality of video tags and confidence scores thereof, one or a plurality of frame tags and confidence scores thereof under the condition that the unique identifier and the video features which are the same as those of the video material do not exist in the video white library and the video black library; in the event that the video material fails the manual review, storing relevant information of the video material in a sample repository for at least one of: model iteration of video dimension machine audit, model iteration of frame dimension machine audit, and accuracy-recall dynamic test.

In this embodiment, for other details of the video auditing system 200 and technical effects brought by corresponding processing of the video auditing system 200, reference may be made to the relevant description of the video auditing method 100, which is not repeated herein.

FIG. 7 is a block diagram illustrating an exemplary computer system 300 to which example embodiments can be applied. A computer system 300 suitable for use in implementing embodiments of the present disclosure is described below in conjunction with FIG. 7. It should be understood that the computer system 300 shown in FIG. 3 is only one example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 7, computer system 300 may include a processing device (e.g., central processing unit, graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage device 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data necessary for the operation of the computer system 300 are also stored. The processing device 301, the ROM 302, and the RAM303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, camera, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; a storage device 308 including, for example, a Flash memory (Flash Card); and a communication device 309. The communication device 309 may allow the computer system 300 to communicate with other devices, either wirelessly or by wire, to exchange data. While fig. 3 illustrates a computer system 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 3 may represent one device or may represent multiple devices, as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure provide a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method 100 shown in fig. 1. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program realizes the above-described functions defined in the system of the embodiment of the present disclosure when executed by the processing apparatus 301.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the computer system 300; or may exist separately and not be incorporated into the computer system 300. The computer readable medium carries one or more programs which, when executed by the computer system, cause the computer system to: determining one or more video tags and confidence scores thereof for the video material; determining one or more frame tags and their confidence scores for at least one frame in the video feed, wherein each frame in the at least one frame has at least one frame tag; and performing machine auditing on the video material according to the one or more video tags and the confidence scores thereof of the video material and the one or more frame tags and the confidence scores thereof of at least one frame in the video material.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprises a first determining unit, a second determining unit, an auditing executing unit and a video pushing unit. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A video auditing method, comprising:

determining one or more video tags and confidence scores thereof for the video material;

determining one or more frame tags and their confidence scores for at least one frame of a plurality of frames of the video material, wherein each frame of the at least one frame has at least one frame tag; and

and performing machine review on the video material according to the one or more video tags and the confidence scores thereof and the one or more frame tags and the confidence scores thereof.

2. A video review method according to claim 1, wherein performing a machine review of the video material comprises:

according to the one or more video tags and the confidence scores thereof, performing video dimension audit on the video material;

when the video material passes the video dimension audit, carrying out frame dimension audit on the video material according to the one or more frame tags and the confidence scores thereof; and

and rejecting subsequent application of the video material if the video material does not pass the video dimension audit.

3. A video review method according to claim 2, wherein performing the video dimension review on the video material comprises:

and judging whether the confidence scores of the video tags representing risks in the one or more video tags are all smaller than a corresponding threshold value, if not, judging that the video material does not pass the video dimension audit, and rejecting the subsequent application of the video material.

4. A video review method according to claim 2, wherein performing the video dimension review on the video material comprises:

calculating at least one weighted sum of confidence scores for the one or more video tags; and

and judging whether the at least one weighted sum is smaller than a corresponding threshold value, if not, judging that the video material does not pass the video dimension audit, and rejecting the subsequent application of the video material.

5. A video review method according to claim 2, wherein performing the frame dimension review on the video material comprises:

determining whether the confidence scores of the one or more frame tags are all less than a corresponding threshold, if not, determining that the video material has not passed the frame dimension audit, rejecting subsequent applications of the video material,

wherein the respective threshold is determined based on at least one of the one or more video dimension tags and a confidence score of the one or more video dimension tags.

6. A video review method according to claim 2, wherein performing the frame dimension review on the video material comprises:

calculating at least one weighted sum of the confidence scores of the one or more video tags and the confidence scores of the one or more frame tags; and

and judging whether the at least one weighted sum is smaller than a corresponding threshold value, if not, judging that the video material does not pass the frame dimension audit, and rejecting the subsequent application of the video material.

7. A video review method according to claim 2, wherein performing the frame dimension review on the video material comprises:

performing first frame dimension audit on the video material;

performing a second frame dimension audit on the video material, wherein

Allowing subsequent application of the video material if the video material passes the first frame dimension audit and the second frame dimension audit,

in the event that the video material fails the first frame dimension audit, rejecting subsequent applications of the video material,

and under the condition that the video material passes the first frame dimension audit and does not pass the second frame dimension audit, the machine audit indicates that manual audit needs to be continued.

8. A video review method according to claim 7, wherein performing the first frame dimension review on the video material includes determining whether the confidence scores for the one or more frame tags are each less than a respective one of one or more first thresholds, if so, determining that the video material passes the first frame dimension review, otherwise determining that the video material fails the first frame dimension review,

performing the second frame dimension audit on the video material comprises judging whether the confidence scores of the one or more frame tags are all less than a corresponding second threshold of the one or more second thresholds, if so, judging that the video material passes the second frame dimension audit, otherwise, judging that the video material does not pass the second frame dimension audit, wherein

For each frame tag, the first threshold is greater than the second threshold.

9. A video review method according to claim 1 or claim 7, wherein, in the event that the machine review indicates that a manual review needs to be continued, the video material is added to a video queue for manual review in dependence upon the confidence scores of the one or more video tags and the confidence scores of the one or more frame tags.

10. A video review method as claimed in claim 9, wherein adding the video material to the video queue for manual review based on the confidence scores of the one or more video tags and the confidence scores of the one or more frame tags comprises:

determining a video priority weighting factor for the video material based on the confidence scores for the one or more video tags and the confidence scores for the one or more frame tags; and

and adding the video material into one of a plurality of video queues for manual review with different priorities according to the video priority weighting factor of the video material, wherein the plurality of video queues for manual review with different priorities push the video material stored therein for manual review according to a preset proportion.

11. A video review method according to claim 10, wherein determining the video priority weighting factor for the video material based on the confidence scores for the one or more video tags and the confidence scores for the one or more frame tags comprises:

calculating a weighted sum of confidence scores of the one or more video tags and confidence scores of the one or more frame tags as the video priority weighting factor for the video feed.

12. A video review method as claimed in claim 1, further comprising:

executing manual review under the condition that the machine review instruction needs to continue manual review; and

adding a unique identifier and video features of the video material to a video whitelibrary if the video material passes the manual review,

in the event that the video material fails the manual review, adding a unique identifier and video features of the video material to a video black library.

13. A video review method according to claim 1 or 12, prior to performing a machine review of the video material in dependence upon the one or more video tags and their confidence scores and the one or more frame tags and their confidence scores, the method further comprising:

performing a pre-review on the video material by comparing the unique identifier and video features of the video material to a plurality of unique identifiers and video features stored in the video white library and the video black library, wherein the video material is pre-reviewed

In the presence of the same unique identifier and video features as the video feed in the video white library, allowing subsequent application of the video feed,

rejecting a subsequent application of the video feed in the presence of a unique identifier and video features in the video blackbase that are the same as the video feed,

and under the condition that the video white library and the video black library do not have the same unique identifier and video characteristics as the video material, performing machine audit on the video material according to the one or more video tags and the confidence scores thereof and the one or more frame tags and the confidence scores thereof.

14. A video review method as claimed in claim 1, further comprising:

in the event that the video material fails the manual review, storing relevant information of the video material in a sample repository for at least one of: model iteration of video dimension machine audit, model iteration of frame dimension machine audit, and accuracy-recall dynamic test.

15. A video review method according to claim 14, wherein the information relating to the video material includes at least one of: the storage address of the video material, whether the video material should be rejected, a video tag of the video material representing a risk, time information of a frame having a frame tag representing a risk in the video material, and a frame tag.

16. A video review method according to claim 1, wherein determining the one or more video tags and their confidence scores for the video material comprises:

classifying the video material using a video classification model to determine the one or more video tags and their confidence scores for the video material, wherein the video classification model classifies the video material taking into account interframe correlation information.

17. A video review method according to claim 1, wherein each of the one or more video tags of the video material belongs to a class of video tags,

determining the one or more video tags and their confidence scores for the video feed comprises:

for each of a plurality of video tag classes, a video tag of the video material belonging to the video tag class is selected from a plurality of candidate video tags included in the video tag class, and a confidence score for the video tag is determined.

18. A video review system comprising:

a first determination unit configured to determine one or more video tags of video material and confidence scores thereof;

a second determination unit configured to determine one or more frame tags of at least one frame of a plurality of frames of the video material and a confidence score thereof, wherein each frame of the at least one frame has at least one frame tag; and

and the auditing execution unit is configured to conduct machine auditing on the video material according to the one or more video tags and the confidence scores thereof and the one or more frame tags and the confidence scores thereof.

19. A computer system, comprising:

one or more processors; and

storage means having stored thereon a computer program that, when executed by the one or more processors, causes the one or more processors to carry out the method of any one of claims 1 to 17.

20. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, causes the processor to carry out the method of any one of claims 1 to 17.