WO2014094492A1

WO2014094492A1 - Method and system for screening depth fusion video

Info

Publication number: WO2014094492A1
Application number: PCT/CN2013/085739
Authority: WO
Inventors: 朱定局
Original assignee: 深圳先进技术研究院
Priority date: 2012-12-22
Filing date: 2013-10-23
Publication date: 2014-06-26
Also published as: CN103049530A; CN103049530B

Abstract

A method and system for screening a depth fusion video, the method comprising the steps of: classifying a to-be-screened video frame by using a preset fusion screening classification method to acquire a video class of the to-be-screened video frame; extracting various characteristics of the to-be-screened video frame; determining the possibility that the to-be-screened video frame belongs to each video subclass of the video class according to various characteristics of the to-be-screened video frame and the characteristic screening fusion parameters of the video class respectively; and comprehensively determining the video subclass of the to-be-screened video frame according to the possibility that the to-be-screened video frame belongs to each video subclass of the video class. Based on various characteristics of the to-be-screened video frame and the characteristic screening fusion parameters of the video class, the solution of the present invention comprehensively considers the effects of various characteristics of a video frame on the video frames in the video class, and differentiates the effects of different characteristics on different types of video frames, thus improving video screening accuracy.

Description

In-depth integration of video review methods and systems

[Technical Field]

The invention relates to the field of video censorship, in particular to an in-depth fusion video review method and an in-depth integrated video censorship system.

【Background technique】

The application of video content is becoming more and more extensive, and the review of video content becomes an important part of the processing of video content. At present, the most common and simple way to review video content is to watch the content of the video file from beginning to end by human eyes, and check whether the video content is restricted or not allowed. content. As an improvement to this manual review method, a video fusion review method has emerged. In this video fusion review method, features such as auditory and visual are integrated in a fixed manner, for example, using a weighted average for fusion. In the way of using the weighted average for fusion, it is assumed that the video content has a 60% chance of being invisible to the video, and 90% of the sound is indecent to expose the video, and 30% of the image is on the image. The possibility is that the video is indecent, and the possibility of (60%+90%+30%)/3 of the video content is determined by weighted synthesis as an indecent naked video.

In the current fusion review of video content, features such as auditory and visual are merged in a fixed manner. In fact, for different types of video, the visibility and judgment of visual and auditory features are different. For example, the visibility and verifiability of text, image, and sound features are different for the sharing of credibility in video censorship. For example, the role played by sound in shooting video, the proportion of fusion should be much larger than the role and proportion in nude video, and the role of images in nude video, the proportion of fusion It should be much larger than the role and proportion in the reactionary speech class, and the role played by the text in the reactionary video should be much larger than the role and proportion of the video in the fight. The current review of video content does not distinguish between the different features of different types of video files, resulting in greatly reduced accuracy of the review.

[Summary of the Invention]

Based on this, in view of the above problems in the prior art, the present invention aims to provide an in-depth fusion video review method and an in-depth fusion video review system, which can improve the accuracy of video fusion review.

In order to achieve the above object, the present invention adopts the following technical solutions:

An in-depth fusion of video review methods, including steps:

The video frames to be examined are classified by using a preset fusion review classification manner, and the video categories to which the video frames to be examined belong are obtained;

Extracting various features in the video frame to be examined;

Determining, according to the various features in the to-be-reviewed video frame, and the feature of the video category, the fusion parameters, and determining the possibility that the video frame to be examined belongs to each video subclass under the video category;

Determining, according to the possibility that the video frame to be examined belongs to each video subclass under the video category, comprehensively determining a video subclass to which the to-be-reviewed video frame belongs.

An in-depth fusion video review system that includes:

The video large class fusion determining module is configured to classify the video frames to be examined by using a preset fusion review classification manner, and obtain a video category to which the video frame to be examined belongs;

a feature extraction module, configured to extract various features in the video frame to be examined;

a video subclass fusion determining module, configured to respectively examine, according to various features in the to-be-reviewed video frame, features of the video category, a fusion parameter, and determine that the to-be-reviewed video frame belongs to each of the video categories. The possibility of the video subclass, and comprehensively determining the video subclass to which the video frame to be examined belongs according to the possibility that the video frame to be examined belongs to each video subclass under the video category.

According to the invention, when reviewing, the video frames to be examined are classified by using a preset fusion review classification manner, the video categories to which the video frames to be examined belong are obtained, and the fusion parameters are examined based on the characteristics of the video categories. And the various features in the video frame to be examined, determining the possibility that the video frame to be examined belongs to each video subclass under the determined video category, and comprehensively determining based on the possibility of belonging to each video subclass. The video subclass to which the video frame to be examined belongs. It examines the fusion parameters based on various features in the video frame to be examined and the features of the video category, and comprehensively considers the role of various features in the video frame in the video frames in the video category, and different Class features distinguish between the effects of different types of video frames, improving the accuracy of video censorship.

[Description of the Drawings]

1 is a schematic flow chart of an embodiment of an in-depth fusion video review method of the present invention;

2 is a schematic flow chart of determining feature review fusion parameters of each video category in a specific example;

3 is a schematic flow chart of an in-depth fusion video review in a specific example;

4 is a schematic structural diagram of an embodiment of the in-depth fusion video review system of the present invention.

【detailed description】

The solution of the present invention will be described in detail below in conjunction with the preferred embodiments thereof. In the following description, an embodiment of the in-depth fusion video review method of the present invention will be described first, and an embodiment of the in-depth fusion video review system of the present invention will be described.

A schematic flow diagram of an embodiment of the in-depth fusion video review method of the present invention is shown in FIG. As shown in FIG. 1, the method in this embodiment includes the steps of:

Step S101: classify the video frames to be examined by using a preset fusion review classification manner, and obtain a video category to which the video frame to be examined belongs.

Step S102: Extract various features in the video frame to be examined;

Step S103: Examining the fusion parameters according to various features in the video frame to be examined and the characteristics of the video category, and determining the possibility that the video frame to be examined belongs to each video subclass under the video category;

Step S104: comprehensively determining the video subclass to which the video frame to be examined belongs according to the possibility that the video frame to be examined belongs to each video subclass under the above video category.

According to the solution in this embodiment, when reviewing, the video frame to be examined is classified by using a preset fusion review classification manner, and the video category to which the video frame to be examined belongs is obtained, and then based on the characteristics of the video category. Examining the fusion parameters and various features in the video frame to be examined, determining the possibility that the video frame to be examined belongs to each video subclass under the above determined video category, and based on the possibility of belonging to each video subclass And comprehensively determining the video subclass to which the video frame to be examined belongs. It examines the fusion parameters based on various features in the video frame to be examined and the features of the video category, and comprehensively considers the role of various features in the video frame in the video frames in the video category, and different Class features distinguish between the effects of different types of video frames, improving the accuracy of video censorship.

In one specific implementation manner, the feature review fusion parameter of the video category may be determined based on the established video sample database. FIG. 2 is a flow chart showing the determination of the feature review fusion parameters of each video category in a specific example.

As shown in FIG. 2, the manner of determining the feature review fusion parameters of each video category in the specific example includes:

Step S201: classify each video frame in the video sample database by using the foregoing preset fusion review classification manner, and obtain a video frame of each video category that is merged and reviewed;

Step S202: classifying each video frame in the video sample database by using various feature review methods, and respectively obtaining video frames of each video category after each feature review;

Step S203: determining, according to the video frames of each video category and the video frames of each video category after each feature review, the accuracy of each feature review of each video category;

Step S204: Determine the feature review fusion parameters of each video category according to the accuracy of each type of feature review of each video category.

The following takes an example of determining the fusion parameters of each video category in FIG. 2 as an example, and a specific example thereof is described in detail.

In the solution of the present invention, the video major categories, various features, and various feature review methods may be different based on actual needs. In this specific example of the present invention, the video category includes yellow video, violent video, and reaction video as examples, and various features including text, sound, and image, and various feature reviews include text feature review and voice feature review. The image feature review is described as an example, and the description is merely an exemplary description and is not intended to limit the present invention. A schematic flow chart of this specific example is shown in FIG.

Before determining the feature review fusion parameters of each video category, a certain number of video samples may be pre-stored in the video sample database, that is, a certain number of video frames are stored in the video sample database, and subsequent video categories are determined. Feature review fusion parameters are described in conjunction with video frames in a video sample database.

Then, firstly, the video frames in the video sample database are classified by using the preset fusion review classification method, and the video frames of each video category that are merged and examined are obtained, that is, video frames belonging to the yellow video and videos belonging to the violent video. A frame, a video frame belonging to a reaction video. After obtaining the video frames of each video category that are merged and reviewed, the video frames of the classified video categories can be respectively placed in the library of each video category of the fusion review, that is, the video frames of the classified yellow video. Put the video frame of the classified violence video into the violent video library (recorded as RB) of the merged review, and put the video frame of the classified reaction video into the yellow video library (recorded as RH). Fusion review of the reactionary video library (marked as RF). The above-mentioned preset fusion review classification method may be performed by any one of the existing and subsequent methods, as long as the video frame belongs to any of the yellow video, the violent video, the reaction video, and the like. Yes, we will not go into details here.

Then, the video frames in the video sample database are classified by using various feature review methods, and the video frames of each video category after each feature review are respectively obtained. In the case where the above various types of features include characters, sounds, and images, the details may be as follows.

The video feature frame is used to classify each video frame in the video sample video database, and the video frames of each video category that are subject to text feature review are obtained, that is, video frames belonging to the yellow video, video frames belonging to the violent video, and belonging to the reaction video. Video frame. After obtaining the video frames of each video category of the character feature review, the classified video frames of each video category can be respectively placed in the library of each video category of the character feature review, that is, the yellow video of the classified video. The video frame is placed in the yellow video library (recorded as WH) of the text feature review, and the video frame of the classified violent video is placed in the violent video library of the character feature review (denoted as WB), and the classified reaction video is The video frame is placed in the reaction video library (denoted as WF) for text feature review. The specific text feature review method may be performed in any manner that is currently available and may occur in the future, and will not be described in detail herein.

The video feature frame is used to classify each video frame in the video sample video database, and the video frames of each video category of the sound feature review are obtained, that is, the video frames belonging to the yellow video, the video frames belonging to the violent video, and the reaction video. Video frame. After obtaining the video frames of each video category of the sound feature review, the classified video frames of each video category can be respectively placed in the library of the video categories of the sound feature review, that is, the yellow video of the classified video. The video frame is placed in the yellow video library (recorded as VH) of the sound feature review, and the video frame of the classified violent video is placed in the violent video library (recorded as VB) of the sound feature review, and the classified reaction video is The video frame is placed in the reaction video library (denoted as VF) of the sound feature review. The specific text feature review method may be performed in any manner that is currently available and may occur in the future, and will not be described in detail herein.

The image feature review mode is used to classify each video frame in the video sample video database, and the video frames of each video category of the image feature review are obtained, that is, the video frames belonging to the yellow video, the video frames belonging to the violent video, and the reaction video. Video frame. After obtaining the video frames of each video category of the image feature review, the classified video frames of each video category may be respectively placed in a library of each video category of the image feature review, that is, the yellow video of the classified video. The video frame is placed in the yellow video library (recorded as GH) of the image feature review, and the video frame of the classified violent video is placed in the violent video library (recorded as GB) of the image feature review, and the classified reaction video is The video frame is placed in the reaction video library (denoted as GF) of the image feature review. The specific image feature review mode may be performed in any manner that is currently available and may occur in the future, and will not be described in detail herein.

Then, according to the video frames of each video category and the video frames of each video category after the feature review, the accuracy of each feature review of each video category is determined. In one specific implementation manner, the specific manner for determining the accuracy of each type of feature review of each video category may be:

Obtaining, respectively, the number of first video frames of the video frame belonging to the current video category of the current class feature review but not belonging to the current video category of the fusion review;

Dividing the number of first video frames by the value of the number of samples of the video frame of the current video category of the merged review as the false positive rate of the current class feature review;

Obtaining, respectively, the number of second video frames belonging to the current video category of the merged review, but not belonging to the current video category of the current class feature review;

Taking the value of the second video frame number and the number of samples of the video frame divided by the current video of the fusion review as the missed rate of the current class feature review;

According to the misjudgment rate and the missed rate of the current class feature review, the accuracy of the current class feature review of the current video category is determined.

In combination with the above description of the specific example of the present invention, each video category includes a yellow video, a violent video, and a reaction video, and various feature reviews performed include text feature review, sound feature review, and image feature review. Therefore, in the end, the accuracy of the character review of the yellow video (violent video, reactionary video), the accuracy of the sound feature review, and the accuracy of the image feature review can be obtained, and a total of nine accuracy rates are obtained.

For the accuracy rate of various feature reviews for determining a violent video, the specific process may be as follows.

First of all, the accuracy rate of the text review of the violent video can be determined by the violent video library RB of the fusion review, and the false positive rate and the missed rate of the violent video library WB of the text review are determined, and then based on the false positive rate and the leak rate. The judgment rate comprehensively determines the accuracy of the text review of the violent video.

For the false positive rate, determine the number of video frames belonging to the violent video library WB of the text review, but not the violent video library RB of the merged review, and divide the number by the number of video frames of the violent video library RB of the merged review. The value obtained is used as the false positive rate for violent video text review, namely:

The false positive rate of violent video text review = (the number of video frames belonging to the violent video library WB of the text review, but not belonging to the violent video library RB of the fusion review) / (the number of video frames of the violent video library RB of the merged review) ).

For the missed rate, the number of video frames belonging to the violent video library RB of the merge review but not the violent video library WB of the text review is determined, and the number is divided by the number of video frames of the violent video library RB of the merged review, The resulting value is used as a missed rate for violent video text review, ie:

Missing rate of violent video text review = (number of video frames belonging to the violent video library RB of the merge review but not violent video library WB of the text review) / (number of video frames of the bred video library RB of the merged review) ).

Then, based on the false positive rate and the missed rate of the violent video text review, the accuracy of the violent video text review is comprehensively determined. When the specific comprehensive determination is made, the false positive rate of the violent video text review, the larger value, the smaller value, the average value, the weighted average value of the violent video text review, or the value calculated by other methods may be used as the misjudgment rate of the violent video text review. The accuracy of the violent video text review, the specific comprehensive determination method can be different according to the actual application needs.

The accuracy of violent video sound review and the accuracy of violent video image review are similar to the above-described methods for determining the accuracy of violent video text review.

For the accuracy of the voice review of the violent video, the violent video library RB of the fusion review may be used as the standard to determine the false positive rate and the missed rate of the violent video library VB of the voice review, and then based on the false positive rate and the missed rate. Comprehensively determine the accuracy of the sound review of violent videos.

For the false positive rate, determine the number of video frames belonging to the violent video library VB of the sound review, but not the violent video library RB of the fusion review, and divide the number by the number of video frames of the violent video library RB of the fusion review. , the value obtained as the false positive rate of violent video sound review, namely:

The false positive rate of violent video sound review = (the number of video frames belonging to the violent video library VB of the sound review, but not belonging to the violent video library RB of the fusion review) / (the number of video frames of the violent video library RB of the merged review) ).

For the missed rate, the number of video frames belonging to the violent video library RB of the fusion review but not the violent video library VB of the sound review is determined, and the number is divided by the number of video frames of the violent video library RB of the merged review, The value obtained is used as the missed rate of violent video sound review, namely:

Missing rate of violent video sound review = (number of video frames belonging to the violent video library RB of the fusion review but not violent video library VB of the sound review) / (number of video frames of the bred video library RB of the merged review) ).

Then, based on the false positive rate and the missed rate of the violent video sound review, the accuracy of the violent video sound review is determined. When the specific comprehensive determination is made, the false positive rate of the violent video sound review, the larger value, the smaller value, the average value, the weighted average value or the value calculated by other methods may be used as the false positive rate of the violent video sound review. The accuracy of violent video sound review, the specific comprehensive determination method can be different according to the actual application needs.

For the accuracy of the image review of the violent video, the violent video library RB of the fusion review may be used as a standard to determine the false positive rate and the missed rate of the violent video library GB of the image review, and then based on the false positive rate and the missed rate. Comprehensively determine the accuracy of image review of violent videos.

For the false positive rate, determine the number of video frames belonging to the violent video library GB of the image review, but not belonging to the violent video library RB of the merge review, and divide the number by the number of video frames of the violent video library RB of the merged review. , the value obtained as the false positive rate of violent video image review, namely:

The false positive rate of violent video image review = (the number of video frames belonging to the violent video library GB of the image review, but not belonging to the violent video library RB of the fusion review) / (the number of video frames of the violent video library RB of the merged review) ).

For the missed rate, the number of video frames belonging to the violent video library RB of the merge review, but not the violent video library GB of the image review, is determined, and the number is divided by the number of video frames of the violent video library RB of the merged review, The resulting value is used as a missed rate for violent video image review, namely:

Missing rate of violent video image review = (the number of video frames belonging to the violent video library RB of the merged review but not belonging to the violent video library GB of the image review) / (the number of video frames of the violent video library RB of the merged review) ).

Then, based on the false positive rate and the missed rate of the violent video image review, the accuracy of the violent video image review is comprehensively determined. When the specific comprehensive determination is made, the false positive rate of the violent video image review, the larger value, the smaller value, the average value, the weighted average value of the violent video image review, or the value calculated by other methods may be used as the misjudgment rate of the violent video image review. The accuracy of violent video image review, the specific comprehensive determination method can be different according to the actual application needs.

In the above description, it is to determine the accuracy of the violent video text review, the accuracy of the violent video sound review, and the accuracy of the violent video image review. For other video categories such as yellow video and reaction video, the specific method for determining the accuracy of each feature review is similar to the above, and will not be described in detail here.

Then, according to the accuracy rate of each feature review of each video category, the feature review fusion parameters of each video category can be comprehensively determined.

For example, the accuracy rate of the various features of the above violent video (including the accuracy of violent video text review, the accuracy of violent video sound review, and the accuracy of violent video image review), for example, the characteristics of the violent video can be recorded. For (rw, rv, rg), where rw represents the fusion parameter of the violent video text review, rv represents the fusion parameter of the violent video sound review, and rg represents the fusion parameter of the violent video image review.

In one specific example, the parameters rw, rv, rg can be determined in the following manners, respectively:

Rw=Accuracy rate of violent video text review/(accuracy of violent video text review+accuracy of violent video sound review, accuracy of violent video image review);

Rv=Accuracy rate of violent video sound review/(accuracy of violent video text review+accuracy of violent video sound review, accuracy of violent video image review);

Rg=Accuracy of violent video image review/(accuracy of violent video text review+accuracy of violent video sound review, accuracy of violent video image review).

It should be noted that the determination manner is merely an exemplary description, and those skilled in the art can foresee that the fusion parameters can be comprehensively determined by other means, which is not exhaustive.

For the feature fusion parameters of other video categories such as yellow video and reaction video, the specific comprehensive determination may be similar to the determination method of the feature fusion parameter of the above violent video, and details are not described herein.

The feature fusion parameters of each of the video categories obtained above can be stored for subsequent review of the fusion of the video frames to be reviewed.

When performing the fusion review on the video frame to be examined, the video frame to be examined may be classified by using the foregoing preset fusion review classification manner, and the video category to which the video frame to be examined belongs belongs. For the purpose of illustration, it is assumed here that the video to which the subject to be reviewed belongs is a violent video.

Then, the corresponding various types of features are extracted from the to-be-reviewed video frame, and specifically may include a text feature, a sound feature, and an image feature.

Then, based on the character features, the sound features, and the image features of the video frame to be examined, combined with the feature fusion parameters of the violent video, the possibility that the video frame to be examined belongs to each video subclass under the violent video is determined. The following is a detailed description in conjunction with one of the specific examples.

Assume that the violent video to which the video frame to be examined belongs belongs to the i-class, and the violent video is further divided into N subclasses, which are respectively recorded as i1, i2, i3, ..., iN.

Then, according to the character feature of the video frame to be examined, it is determined that the video frame to be examined or the character feature belongs to the i1th subclass, the possibility wi1, the possibility i2 belonging to the i2th subclass, and belongs to the i3th subclass. The possibility wi3, ..., the possibility wiN belonging to the iNth subclass. Thus there must be wi1+wi2+wi3+...+wiN=1.

Determining, according to the sound characteristics of the video frame to be examined, the possibility that the video frame to be examined or the sound feature belongs to the i1th subclass, the possibility vi2 belonging to the i2th subclass, and the possibility of belonging to the i3th subclass Sex vi3, ..., the possibility viN belonging to the iNth subclass. Therefore, there will inevitably be vi1+vi2+vi3+...+viN=1.

Determining, according to the image feature of the video frame to be examined, the possibility that the video frame to be examined or the image feature belongs to the i1th subclass gi1, the possibility gi2 belonging to the i2th subclass, and the possibility of belonging to the i3th subclass Sex gi3, ..., the probability giN belonging to the iNth subclass. Thus there must be gi1+gi2+gi3+...+giN=1.

Therefore, based on the obtained results, it can be obtained that the video frames to be examined belong to the subcategories of the violent video:

The possibility that the video frame to be examined belongs to the i1th subclass is: pi1=rw*wi1+rv+vi1+rg*gi1;

The possibility that the video frame to be examined belongs to the i2th subclass is: pi2=rw*wi2+rv+vi2+rg*gi2;

The possibility that the video frame to be examined belongs to the i3th subclass is: pi3=rw*wi3+rv+vi3+rg*gi3;

......

The possibility that the video frame to be examined belongs to the i-th subclass is: piN=rw*wiN+rv+viN+rg*giN.

Therefore, according to the possibility that the video frame to be examined belongs to the i1, i2, i3, ..., iN subclasses under the violent video, pi1, pi2, pi3, ..., piN, the video frame to be examined may be comprehensively determined. Video subclass. In general, the video subclass corresponding to the maximum value of pi1, pi2, pi3, ..., piN may be determined as the video subclass to which the video frame to be examined belongs.

According to the in-depth fusion video review method of the present invention described above, the present invention also provides an in-depth fusion video review system. The following describes an embodiment of the in-depth fusion video review system of the present invention.

A block diagram of an embodiment of the in-depth fusion video review system of the present invention is shown in FIG. As shown in FIG. 4, the in-depth fusion video review system in this embodiment includes:

The video major class fusion determining module 401 is configured to classify the video frames to be examined by using a preset fusion review classification manner, and obtain a video category to which the video frame to be examined belongs.

The feature extraction module 402 is configured to extract various features in the video frame to be examined;

The video subclass fusion determining module 403 is configured to respectively review the fusion parameters according to the various features in the to-be-reviewed video frame and the features of the video category, and determine that the to-be-reviewed video frame belongs to the video category. The possibility of each video subclass, and comprehensively determining the video subclass to which the video frame to be examined belongs according to the possibility that the video frame to be examined belongs to each video subclass under the video category.

As shown in FIG. 4, the in-depth fusion video review system in this embodiment may further include: a fusion parameter determination module 404 for determining feature review fusion parameters of the video categories.

As shown in FIG. 4, the fusion parameter determination module 404 includes:

The sample fusion review module 4041 is configured to classify each video frame in the video sample database by using the preset fusion review classification manner, and obtain a video frame of each video category that is merged and reviewed;

The sample classification review module 4042 is configured to classify each video frame in the video sample database by using various feature review methods, and obtain video frames of each video category after each feature review;

The sample accuracy determining module 4043 is configured to determine the accuracy of each type of feature review of each video category according to the video frames of each video category and the video frames of each video category after the feature review. ;

The fusion parameter comprehensive determination module 4044 determines the feature review fusion parameters of each video category according to the accuracy of each type of feature review of each video category.

In one specific example, the fusion parameter integration determining module 4044 may be a current ratio of the accuracy of the current class feature review of the current video category to the sum of the accuracy of the current video category. The fusion parameters of the current class feature review of the video category, the feature fusion review parameters of the current video category include the fusion parameters of various feature reviews of the current video category.

As shown in FIG. 4, the sample accuracy determining module 4043 may specifically include:

The false positive rate determining module 40431 is configured to respectively acquire the first video frame number of the video frame belonging to the current video category of the current class feature review but not belonging to the current video category of the merge review, and the first video frame The number is divided by the value of the sample number of the video frame of the current video category of the fusion review as the false positive rate of the current class feature review;

The missed rate determining module 40432 is configured to respectively acquire the number of second video frames belonging to the current video category of the merged review but not the current video category of the current class feature review, and the second video frame The number and the value of the number of samples of the video frame divided by the current video category of the fusion review are used as the missed rate of the current class feature review;

The accuracy rate determining module 40433 is configured to determine an accuracy rate of the current class feature review of the current video category according to the false positive rate and the missed rate of the current class feature review.

In one specific example, the accuracy determination module 40433 may use the error rate of the current class feature review, the average value of the missed rate, or the weighted average as the accuracy of the current class feature review of the current video category.

As shown in FIG. 4, in one example, the video subclass fusion determining module 403 may specifically include:

The feature subclass likelihood determining module 4031 is configured to determine, respectively, the possibility that each type of feature in the video frame to be examined belongs to each video subclass under the video category;

The video subclass likelihood determining module 4032 is configured to determine the video frame to be reviewed according to the possibility that each type of feature belongs to each video subclass under the video category, and the feature review fusion parameter of the video category The possibility of belonging to each video subclass under the video category;

a subclass determining module, configured to determine a maximum value of a likelihood that the video frame to be examined belongs to each video subclass under the video category, and determine a video subclass corresponding to the maximum value of the likelihood as the The video subclass to which the video frame to be reviewed belongs.

The specific implementation manner of each module in the in-depth fusion video review system of the present invention may be the same as that in the above-mentioned in-depth fusion video review method of the present invention, and details are not described herein.

The above-mentioned embodiments are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but is not to be construed as limiting the scope of the invention. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, the scope of the invention should be determined by the appended claims.

Claims

An in-depth fusion video review method, characterized in that it comprises the steps of:

The video frames to be examined are classified by using a preset fusion review classification manner, and the video categories to which the video frames to be examined belong are obtained;

Extracting various features in the video frame to be examined;

Determining, according to the various features in the to-be-reviewed video frame, and the feature of the video category, the fusion parameters, and determining the possibility that the video frame to be examined belongs to each video subclass under the video category;

Determining, according to the possibility that the video frame to be examined belongs to each video subclass under the video category, comprehensively determining a video subclass to which the to-be-reviewed video frame belongs.
The in-depth fusion video review method according to claim 1, wherein the method for determining the feature review fusion parameters of the video category comprises:

Using the preset fusion review classification method to classify each video frame in the video sample database, and obtain a video frame of each video category that is merged and reviewed;

Each video frame in the video sample database is classified by using various feature review methods, and video frames of each video category after each feature review are respectively obtained;

Determining the accuracy of each type of feature review of each video category according to the video frames of each video category reviewed by the fusion review and the video frames of each video category after review of various features;

According to the accuracy rate of each type of feature review of each video category, the feature review fusion parameters of each video category are determined.
The in-depth fusion video review method according to claim 2, wherein the manner of determining the accuracy of each feature review of each video category comprises:

Obtaining, respectively, the number of first video frames of the video frame belonging to the current video category of the current class feature review but not belonging to the current video category of the fusion review;

Dividing the number of the first video frame by the value of the sample number of the video frame of the current video category of the fusion review as the false positive rate of the current class feature review;

Obtaining, respectively, the number of second video frames belonging to the current video category of the merged review, but not belonging to the current video category of the current class feature review;

And determining, by the second video frame number, a value of a sample number of the video frame divided by the current video of the fusion review as a missed rate of the current class feature review;

According to the misjudgment rate and the missed rate of the current class feature review, the accuracy of the current class feature review of the current video category is determined.
The in-depth fusion video review method according to claim 3, wherein the error rate of the current class feature review, the average value of the missed rate, or the weighted average value is used as the accuracy rate of the current class feature review of the current video category. .
The in-depth fusion video review method according to claim 2, wherein the method for determining the feature review fusion parameters of each video category according to the accuracy of each type of feature review of each video category includes:

The ratio of the accuracy of the current class feature review of the current video category to the sum of the accuracy of the various feature reviews of the current video category is taken as the fusion parameter of the current class feature review of the current video category;

The feature fusion review parameters of the current video category include the fusion parameters of various feature reviews of the current video category.
The in-depth fusion video review method according to any one of claims 1 to 5, wherein the manner of determining the possibility that the video frame to be examined belongs to each video subclass under the video category comprises:

Determining, respectively, the possibility that each type of feature in the video frame to be examined belongs to each video subclass under the video category;

Determining, according to the possibility that each type of feature belongs to each video subclass under the video category, and the feature review fusion parameter of the video category, determining that the video frame to be examined belongs to each video under the video category The possibility of a class.
The in-depth fusion video review method according to any one of claims 1 to 5, wherein the method for comprehensively determining the video subclass to which the video frame to be examined belongs includes:

Determining a maximum value of the likelihood that the video frame to be examined belongs to each video subclass under the video category;

The video subclass corresponding to the maximum value of the likelihood is determined as the video subclass to which the video frame to be examined belongs.
An in-depth fusion video review system featuring:

The video large class fusion determining module is configured to classify the video frames to be examined by using a preset fusion review classification manner, and obtain a video category to which the video frame to be examined belongs;

a feature extraction module, configured to extract various features in the video frame to be examined;

a video subclass fusion determining module, configured to respectively examine, according to various features in the to-be-reviewed video frame, features of the video category, a fusion parameter, and determine that the to-be-reviewed video frame belongs to each of the video categories. The possibility of the video subclass, and comprehensively determining the video subclass to which the video frame to be examined belongs according to the possibility that the video frame to be examined belongs to each video subclass under the video category.
The in-depth fusion video review system according to claim 8, further comprising:

The fusion parameter determining module is configured to determine a feature review fusion parameter of each of the video categories.
The in-depth fusion video review system according to claim 9, wherein the fusion parameter determination module comprises:

a sample fusion review module, configured to classify each video frame in the video sample database by using the preset fusion review classification manner, and obtain a video frame of each video category that is merged and reviewed;

The sample classification review module is configured to classify each video frame in the video sample database by using various feature review methods, and obtain video frames of each video category after each feature review;

a sample accuracy determining module, configured to determine an accuracy rate of each type of feature review of each video category according to the video frame of each video category and the video frames of each video category after the feature review;

The fusion parameter comprehensive determination module determines the feature review fusion parameters of each video category according to the accuracy of each type of feature review of each video category.
The in-depth fusion video review method according to claim 10, wherein the sample accuracy determination module comprises:

The false positive rate determining module is configured to respectively obtain the first video frame number of the current video category that belongs to the current video category of the current class feature review but does not belong to the current video category of the fusion review, and the number of the first video frame Dividing the value of the sample number of the video frame of the current video category of the fusion review as the false positive rate of the current class feature review;

a missing judgment rate determining module, configured to respectively acquire a number of second video frames belonging to a current video category of the merged review, but not belonging to a current video category of the current class feature review, and the number of the second video frames The value of the sample number of the video frame divided by the current video category of the fusion review is used as the missed rate of the current class feature review;

The accuracy determination module is configured to determine the accuracy of the current class feature review of the current video category according to the false positive rate and the missed rate of the current class feature review.
The in-depth fusion video review system according to claim 11, wherein the accuracy determination module determines the false positive rate, the average value of the missed rate, or the weighted average of the current class feature review as the current video category. The accuracy of the current class feature review.
The in-depth fusion video review system according to claim 10, wherein the fusion parameter comprehensive determination module examines the accuracy of the current class feature review of the current video category with respect to various features of the current video category. The ratio of the sum of the accuracy rates is used as the fusion parameter of the current class feature review of the current video category. The feature fusion review parameters of the current video category include the fusion parameters of various feature reviews of the current video category.
The in-depth fusion video review system according to any one of claims 8 to 13, wherein the video subclass fusion determining module comprises:

a feature subclass likelihood determining module, configured to respectively determine, according to the possibility that each type of feature in the video frame to be examined belongs to each video subclass under the video category;

a video subclass likelihood determining module, configured to determine, according to the possibility that each type of feature belongs to each video subclass under the video category, and the feature review fusion parameter of the video category, to determine that the to-be-reviewed video frame belongs to The possibility of each video subclass under the video category;

a subclass determining module, configured to determine a maximum value of a likelihood that the video frame to be examined belongs to each video subclass under the video category, and determine a video subclass corresponding to the maximum value of the likelihood as the The video subclass to which the video frame to be reviewed belongs.