CN114780795A

CN114780795A - Video material screening method, device, equipment and medium

Info

Publication number: CN114780795A
Application number: CN202210493323.3A
Authority: CN
Inventors: 黄攀
Original assignee: Jinan Boguan Intelligent Technology Co Ltd
Current assignee: Jinan Boguan Intelligent Technology Co Ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2022-07-22

Abstract

The application discloses a video material screening method, a device, equipment and a medium, which relate to the technical field of video processing, and the method comprises the following steps: acquiring video clips to be screened and face photos of target users, and screening out target video clips corresponding to face feature information of the face photos from all the video clips to be screened; scoring the target video clip according to a preset video evaluation dimension, and constructing an evaluation label of the target video clip based on the preset video evaluation dimension and a corresponding scoring result; adding the corresponding evaluation tag to the target video clip to obtain a video material library; and acquiring a retrieval request which is sent by a user terminal and constructed based on a target evaluation label, and returning the target video clip which is screened from the video material library and corresponds to the target evaluation label to the user terminal. Through the scheme, the video materials can be rapidly screened according to the requirements of users before video clips are edited.

Description

Video material screening method, device, equipment and medium

Technical Field

The invention relates to the technical field of video processing, in particular to a method, a device, equipment and a medium for screening video materials.

Background

Currently, with popularization of smart phones and rapid development of live video and short video technologies, more and more people record life and spread information through short videos, so that the development of the short videos is more and more mature and popular. For example, when a visitor plays a game in a place such as a casino or a scenic spot, the visitor may select some video clips from videos captured by a camera carried by the visitor or a camera of the scenic spot to perform social sharing. In order to ensure that the produced films are composed of video segments which meet the will of a producer or show the aesthetic feeling, before video editing is carried out, video segments which meet the requirements of a user are selected from a large number of portrait video segments and are screened out, and labels and priorities are marked for the selected materials so as to facilitate later-stage video editing. In summary, the problem of how to rapidly screen video materials according to user requirements before video editing needs to be further solved.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus, a device and a medium for screening video materials, which can rapidly screen video materials according to user requirements before video editing. The specific scheme is as follows:

in a first aspect, the present application discloses a method for screening video materials, comprising:

acquiring video clips to be screened and face photos of target users, and screening out target video clips corresponding to face feature information of the face photos from all the video clips to be screened;

scoring the target video clip according to a preset video evaluation dimension, and constructing an evaluation label of the target video clip based on the preset video evaluation dimension and a corresponding scoring result;

adding the corresponding evaluation tag to the target video clip to obtain a video material library;

and acquiring a retrieval request which is sent by a user terminal and constructed based on a target evaluation label, and returning the target video clip which is screened from the video material library and corresponds to the target evaluation label to the user terminal.

Optionally, the screening out a target video segment corresponding to the face feature information of the face photograph from all the video segments to be screened includes:

inputting the face picture into a trained face recognition model to obtain a target face feature vector corresponding to the face picture;

and determining a face feature vector corresponding to each video clip to be screened by using the face recognition algorithm model, and screening a target video clip corresponding to the target face feature vector from all the video clips to be screened.

Optionally, the scoring the target video segment according to a preset video evaluation dimension includes:

and scoring the target video clip according to the video quality evaluation dimension and the video evaluation dimension constructed based on the portrait feature information to obtain a corresponding video quality scoring result and a portrait feature scoring result.

Optionally, scoring the target video segment according to a video evaluation dimension constructed based on the portrait feature information to obtain a corresponding portrait feature scoring result, where the scoring includes:

and scoring the target video segment according to any one or combination of several of the human face quality evaluation dimension, the human face expression evaluation dimension and the portrait gesture evaluation dimension to obtain any one or combination of several of a corresponding human face quality scoring result, a corresponding human face expression scoring result and a corresponding portrait gesture scoring result.

Optionally, the scoring the target video segment according to the face quality evaluation dimension to obtain a corresponding face quality scoring result includes:

detecting the target video clip by using a face integrity classification model corresponding to the face integrity evaluation dimension so as to determine a face integrity evaluation result of the portrait in the target video clip;

detecting the target video segment by using a face definition classification model corresponding to a face definition evaluation dimension to determine a face definition grading result of a portrait in the target video segment;

detecting the target video segments by using a face detection model to obtain corresponding face areas, determining pupil distance distances corresponding to the face areas, and then grading the target video segments according to face size evaluation dimensions based on the pupil distance distances to obtain corresponding face size grading results;

and determining a corresponding face quality grading result according to the face integrity grading result, the face definition grading result and the face size grading result.

Optionally, scoring the target video segment according to the facial expression evaluation dimension to obtain a corresponding facial expression scoring result, including:

determining human face target feature points of the portrait in the target video clip and corresponding feature point position changes;

and scoring the facial expressions of the human images in the target video clips based on the facial target feature points and the feature point position changes to obtain corresponding facial expression scoring results.

Optionally, scoring the target video segment according to the portrait gesture evaluation dimension to obtain a corresponding portrait gesture scoring result, where the scoring includes:

acquiring human body joint characteristic points of a portrait in the target video clip;

determining corresponding gesture features based on the position information of the human body joint feature points;

if the gesture features are consistent with preset gesture features, acquiring face information of a portrait corresponding to the human body joint feature points, and judging whether the face information is matched with face information corresponding to the target user or not to obtain a corresponding matching result;

and scoring the target video segment based on the matching result to obtain a corresponding portrait gesture scoring result.

Optionally, scoring the target video segment according to a video evaluation dimension constructed based on video quality to obtain a corresponding video quality scoring result, including:

determining a definition grading result of the target video clip based on the size relation between the resolution of the target video clip and a preset resolution threshold and a video fuzzy grade obtained after the target video clip is detected through a video fuzzy grade detection model;

determining a brightness grading result of the target video clip based on a judgment result of whether the brightness of the target video clip is within a preset brightness threshold range;

detecting whether a backlight spot meeting a first preset condition exists in the target video clip through a backlight spot detection model to obtain a corresponding backlight spot detection result, and determining a backlight scoring result of the target video clip based on the backlight spot detection result;

detecting a relative position relation between a portrait area in the target video clip and a video picture of the target video clip, and determining a portrait position scoring result of the target video clip based on a judgment result of whether the relative position relation meets a second preset condition;

and determining a corresponding video quality scoring result according to the definition scoring result, the brightness scoring result, the backlight scoring result and the portrait position scoring result.

In a second aspect, the present application discloses a video material screening apparatus, comprising:

the target video screening module is used for acquiring video clips to be screened and face photos of target users, and screening out target video clips corresponding to face feature information of the face photos from all the video clips to be screened;

the video scoring module is used for scoring the target video clip according to a preset video evaluation dimension and constructing an evaluation label of the target video clip on the basis of the preset video evaluation dimension and a corresponding scoring result;

the video material library establishing module is used for adding the corresponding evaluation tag to the target video clip to obtain a video material library;

and the video material screening module is used for acquiring a retrieval request which is sent by the user terminal and constructed on the basis of the target evaluation tag, and returning the target video clip which is screened from the video material library and corresponds to the target evaluation tag to the user terminal.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the video material screening method disclosed in the foregoing disclosure.

In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the video material screening method disclosed in the foregoing disclosure.

When video material screening is carried out, firstly, video segments to be screened and face photos of target users are obtained, target video segments corresponding to face feature information of the face photos are screened from all the video segments to be screened, the target video segments are scored according to preset video evaluation dimensions, evaluation labels of the target video segments are constructed on the basis of the preset video evaluation dimensions and corresponding scoring results, then corresponding evaluation labels are added to the target video segments to obtain a video material library, finally, retrieval requests which are sent by user terminals and constructed on the basis of the target evaluation labels are obtained, and the target video segments which are screened from the video material library and correspond to the target evaluation labels are returned to the user terminals. Therefore, the video clips containing the target users are screened out by acquiring the face feature information of the face photos of the target users, the target video clips are scored through the preset video evaluation dimension, and the corresponding evaluation labels are constructed to establish the video material library. Therefore, before video editing is carried out, the video segments to be screened are screened through the face characteristic information in the face photos of the target users, so that the target video segments containing the target users can be rapidly screened before video editing; on the other hand, the target video clip is scored according to the preset video evaluation dimension, and the evaluation label and the video material library are jointly constructed according to the preset video evaluation dimension and the score, so that the target video clip can be subjected to labeling management, and subsequent video retrieval is facilitated. When a material request of a user is obtained, returning a corresponding target video clip for the user according to the corresponding evaluation tag so as to achieve the purpose of rapidly screening the video material according to the user requirement. In conclusion, the video material can be rapidly screened according to the requirements of the user before the video clip.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a video material screening method provided in the present application;

fig. 2 is a flowchart of a specific video material screening method provided in the present application;

fig. 3 is a flowchart of a specific video material screening method provided in the present application;

fig. 4 is a schematic diagram of a face target feature point provided by the present application;

FIG. 5 is a schematic diagram of the human body joint point feature points of the portrait provided by the present application;

FIG. 6 is a schematic view of an optimal imaging area provided by the present application;

fig. 7 is a flowchart of a specific video material screening method provided in the present application;

fig. 8 is a schematic structural diagram of a video material screening apparatus according to the present application;

fig. 9 is a block diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Currently, with the popularization of smart phones and the rapid development of video live broadcast and short video technologies, more and more people record life and spread information through short videos. For example, when a guest plays at a place such as a casino or a scenic spot, the guest may wish to select some video clips from videos captured by a camera carried by the guest or a camera in the scenic spot for social sharing. In order to ensure that produced films are composed of video segments meeting the wishes of producers or showing aesthetic feeling, video segments meeting the requirements of users need to be selected and screened from a large number of portrait video segments before video editing is carried out, and labels and priorities are marked on the selected materials so as to facilitate later-stage video editing. Therefore, the video material screening method can be used for rapidly screening the video materials according to the requirements of users before video editing.

The embodiment of the invention discloses a video material screening method, which comprises the following steps of:

step S11: the method comprises the steps of obtaining video clips to be screened and face photos of target users, and screening out target video clips corresponding to face feature information of the face photos from all the video clips to be screened.

Specifically, video segments to be screened and a face photo of a target user are obtained, the face photo is input into a trained face recognition model to obtain a target face feature vector corresponding to the face photo, then the face feature vector corresponding to each video segment to be screened is determined by using the face recognition algorithm model, and a target video segment corresponding to the target face feature vector is screened from all the video segments to be screened. It can be understood that after the video segment to be screened and the face photo of the target user are obtained, the face feature vectors in the video segment to be screened and the face photo of the target user are identified through the face identification model and compared with each other to obtain the target video segment containing the target face feature vector. By the scheme, the video clips containing the target users are screened out to obtain the target video clips containing the target users in the video clips to be screened, so that the target video clips can be further scored according to the preset video dimensionality.

It should be noted that the application scenes to which the present embodiment is applicable may include, but are not limited to, scenes when the guest travels to a playground, a scenic spot, and the like for playing. It can be understood that, in a playing scene, the video clips to be screened in this embodiment specifically refer to video clips related to the playing place, and the video clips may be video clips shot by a camera preset in the playing place, or may also be video clips shot by shooting devices carried by visitors.

In a specific embodiment, the process of acquiring the video segment to be filtered may include: and acquiring the video clip to be screened by a mode of capturing the video of the visitor through a camera which is preset and installed in the playing place. By the video clip acquisition mode, the video clip which is difficult for the tourist to shoot by himself can be shot. For example, when the tourist experiences violent sports events such as roller coaster, bungee jumping, and trojan horse, the video is difficult to shoot by oneself, various high-quality playing video clips of the tourist and the fellow can be very conveniently obtained through the scenic spot camera at the moment, so that the interested video clips with higher evaluation of the tourist can be conveniently screened out from all the playing video clips shot by the scenic spot camera in the follow-up process, and the access amount and the download rate of the video clips shot by the scenic spot camera can be favorably improved.

Step S12: and scoring the target video clip according to a preset video evaluation dimension, and constructing an evaluation label of the target video clip based on the preset video evaluation dimension and a corresponding scoring result.

In this embodiment, the target video segment is scored according to a preset video evaluation dimension, and an evaluation label of the target video segment is constructed based on the preset video evaluation dimension and a corresponding scoring result. Through the scheme, the corresponding evaluation label is constructed for the target video clip, wherein the evaluation label comprises the evaluation dimension and the corresponding scoring result, so that a user can conveniently screen the target video clip according to the requirement during subsequent screening. It is to be understood that the preset video evaluation dimension may be set according to specific needs of a user, and the preset video evaluation dimension includes, but is not limited to, video definition and human image definition.

Step S13: and adding the corresponding evaluation tag for the target video clip to obtain a video material library.

In this embodiment, a corresponding evaluation tag is added to the target video clip, and the target video clip and the corresponding evaluation tag are stored in the video material library together, so that the target video clip corresponding to the request is returned to the user terminal when the user initiates a retrieval request in the following process.

Step S14: and acquiring a retrieval request which is sent by a user terminal and constructed based on a target evaluation label, and returning the target video clip which is screened from the video material library and corresponds to the target evaluation label to the user terminal.

In this embodiment, when the user terminal initiates a retrieval request constructed based on the target evaluation tag, the target video segment corresponding to the target evaluation tag is screened from the video material library according to the retrieval request of the user, and the target video segment is returned to the user terminal.

The following describes the technical solution of the present embodiment by taking a scene when a tourist goes to a scenic spot for playing as an example. Under the scene, a scenic spot server acquires video clips shot by cameras pre-installed in different areas of a scenic spot, receives face photos uploaded by a user through a client, then screens out target video clips corresponding to face feature information of the face photos from all the acquired video clips, scores the target video clips according to preset video evaluation dimensions, constructs corresponding evaluation labels, and adds the corresponding evaluation labels to the target video clips to obtain a video material library stored in the scenic spot server. When a user needs to download a video clip from the scenic spot server, a corresponding retrieval request can be initiated to the scenic spot server through a client on the user terminal and based on the interested target evaluation tag. After receiving the retrieval request, the scenic spot server analyzes the retrieval request to extract the target evaluation tag from the retrieval request, and then utilizes the target evaluation tag to screen out a video clip corresponding to the target evaluation tag from the video material library and issues the video clip to the user terminal, so that the user can quickly acquire the video clip of interest from the scenic spot server, and the problems of low screening speed and the like caused by manual screening of the video clip are avoided.

Therefore, in the embodiment, the video clips containing the target user are screened out by obtaining the face feature information of the face picture of the target user, and the target video clips are scored and corresponding evaluation labels are constructed through the preset video evaluation dimension to establish the video material library. Therefore, before video editing is carried out, the video segments to be screened are screened through the face characteristic information in the face photos of the target users, so that the target video segments containing the target users can be rapidly screened before video editing; on the other hand, the target video segments are scored according to the preset video evaluation dimension, and the evaluation labels and the video material library are jointly constructed according to the preset video evaluation dimension and the score, so that the target video segments can be subjected to labeling management, and subsequent video retrieval is facilitated. When a material request of a user is obtained, a corresponding target video clip is returned to the user according to the corresponding evaluation tag so as to achieve the purpose of rapidly screening the video material according to the user requirement. In conclusion, the video material can be rapidly screened according to the requirements of the user before the video clip.

Referring to fig. 2, the embodiment of the present invention discloses a specific video material screening method, and compared with the previous embodiment, the present embodiment further describes and optimizes the technical solution.

Step S21: the method comprises the steps of obtaining video clips to be screened and face photos of target users, and screening out target video clips corresponding to face feature information of the face photos from all the video clips to be screened.

Step S22: according to the video quality evaluation dimension and the video evaluation dimension constructed based on the portrait feature information, the target video clip is scored to obtain a corresponding video quality scoring result and a portrait feature scoring result, and an evaluation label of the target video clip is constructed based on the evaluation dimension and the corresponding scoring result.

It can be understood that both the video quality evaluation dimension and the video evaluation dimension constructed based on the portrait feature information can be set individually according to the user requirements, and the video quality evaluation dimension includes, but is not limited to, video definition and video brightness; the video evaluation dimension constructed based on the portrait feature information includes but is not limited to portrait size, portrait definition, portrait gesture and the like. By the scheme, the evaluation label of the target video clip is constructed based on the preset video evaluation dimension and the corresponding grading result, so that labeling management and subsequent establishment of a video material library are conveniently carried out on the target video clip.

Step S23: and adding the corresponding evaluation tag for the target video clip to obtain a video material library.

Step S24: and acquiring a retrieval request which is sent by a user terminal and constructed based on a target evaluation label, and returning the target video clip which is screened from the video material library and corresponds to the target evaluation label to the user terminal.

It can be seen that, in the embodiment, the target video segments are respectively scored through the video quality evaluation dimension and the video evaluation dimension constructed based on the portrait characteristic information, so that the scoring dimension of the target video is diversified and fine, the evaluation labels constructed on the target video segments are more fine, and the video material screening effect is better.

Referring to fig. 3, the embodiment of the present invention discloses a specific video material screening method, and compared with the previous embodiment, the present embodiment further describes and optimizes the technical solution.

Step S31: the method comprises the steps of obtaining video clips to be screened and face photos of target users, and screening out target video clips corresponding to face feature information of the face photos from all the video clips to be screened.

Step S32: and scoring the target video segment according to any one or combination of several of a face quality evaluation dimension, a face expression evaluation dimension and a portrait gesture evaluation dimension to obtain any one or combination of several of a corresponding face quality scoring result, a face expression scoring result and a portrait gesture scoring result, and constructing an evaluation label of the target video segment based on the evaluation dimension and the corresponding scoring result.

In this embodiment, the target video segment is scored through any one or a combination of a face quality evaluation dimension, a face expression evaluation dimension and a portrait gesture evaluation dimension to obtain a scoring result, and an evaluation label of the target video segment is constructed based on the evaluation dimension and the corresponding scoring result. It is understood that, in practical use, the target video segment may be scored according to the user's needs by any one or combination of a face quality evaluation dimension, a face expression evaluation dimension and a portrait gesture evaluation dimension.

In this embodiment, scoring the target video segment according to the face quality evaluation dimension to obtain a face quality scoring result includes: detecting the target video segment by using a face integrity classification model corresponding to a face integrity evaluation dimension to determine a face integrity scoring result of a portrait in the target video segment, wherein the face integrity classification model for face integrity evaluation in the embodiment can be specifically obtained by a deep learning model specially trained in advance; detecting the target video segment by using a face definition classification model corresponding to a face definition evaluation dimension to determine a face definition grading result of a portrait in the target video segment, wherein the face definition classification model for evaluating the face definition in the embodiment can be specifically obtained by a specially trained deep learning model in advance; detecting the target video clip by using a face detection model to obtain the correspondenceDetermining a pupil distance corresponding to the face area, and then scoring the target video segment according to a face size evaluation dimension based on the pupil distance to obtain a corresponding face size scoring result; and determining a corresponding face quality grading result according to the face integrity grading result, the face definition grading result and the face size grading result. Wherein, it is right to utilize face detection model the target video segment detects and determines the interpupillary distance that the face area corresponds to judge whether interpupillary distance accords with preset interpupillary distance interval value, it predetermines interpupillary distance interval value and can set for according to actual need, for example, predetermine interpupillary distance interval value and be 100 pixels to 500 pixels, when interpupillary distance is not conform to preset interpupillary distance interval value, face size score result c₁0; when the interpupillary distance accords with the preset interpupillary distance interval value, the face size scoring result c ₁100. Further, the target video segments are scored according to the face quality evaluation dimension to obtain a face quality scoring result, and a corresponding face quality scoring result is determined according to the face integrity scoring result, the face definition scoring result and the face size scoring result, wherein the face integrity scoring result is a₁The weight of the face integrity score result is alpha₁(ii) a The face definition scoring result is b₁The weight of the face definition score result is beta₁(ii) a The face size scoring result is c₁The weight of the face size scoring result is gamma₁And then the face quality scoring result is as follows:

Y₁＝α₁a₁+β₁b₁+γ₁c₁；

in the formula, Y₁Representing a face quality scoring result; a is a₁∈[0,100]，b₁∈[0,100]，c₁∈[0,100]；α₁∈[0,1]，β₁∈[0,1]，γ₁∈[0,1]It can be understood that α₁、β₁、γ₁Can be set according to actual requirements.

In the present embodiment, the person-by-personThe face expression evaluation dimension is used for scoring the target video segment to obtain a face expression scoring result, and the face expression scoring result comprises the following steps: determining human face target feature points of the portrait in the target video clip and corresponding feature point position changes; and scoring the facial expressions of the human images in the target video clips based on the facial target feature points and the feature point position changes to obtain corresponding facial expression scoring results. In this embodiment, specifically, a pre-trained face feature point detection model may be used to determine face target feature points in a target video segment. In addition, the embodiment can input the position change information of the face target feature point into a pre-specially trained expression evaluation model to obtain a face expression scoring result output by the expression evaluation model. As shown in fig. 4, a certain target video segment is detected by the face feature point detection model to determine face target feature points of a portrait in the target video segment, to obtain 106 feature points, and a scoring result of a facial expression is determined according to a position change of the face target feature points of each part of the face, especially a change of a range of the face target feature points of the mouth, eyes and jaw, that is, information of the position change of the face target feature points is input to the expression evaluation model to obtain a corresponding scoring result of the facial expression. Wherein the facial expression index is a₂，a₂∈[0,100]And the corresponding face expression scoring result is as follows:

Y₂＝a₂；

in the formula, Y₂And representing the facial expression scoring result. It can be understood that the facial expressions include but are not limited to smiling, angry and surprise, and the corresponding facial expression index can be set according to requirements in the application of an actual scene.

In this embodiment, scoring the target video segment according to the portrait gesture evaluation dimension to obtain a portrait gesture scoring result includes: acquiring human body joint characteristic points of a portrait in the target video clip; determining corresponding gesture features based on the position information of the human body joint feature points; if the gesture feature is consistent with the preset gesture feature, acquiring the gesture feature and the preset gesture featureFace information of a portrait corresponding to the human body joint feature point is judged, whether the face information is matched with face information corresponding to the target user is judged, and a corresponding matching result is obtained; and scoring the target video segment based on the matching result to obtain a corresponding portrait gesture scoring result. In this embodiment, the position information of the human joint feature points may be input into a gesture detection model trained specially in advance, so as to obtain corresponding gesture features output by the gesture detection model. It should be noted that when it is determined that the corresponding gesture feature is consistent with the preset gesture feature based on the position information of the human body joint feature point, as shown in fig. 5, it is determined whether the face information of the portrait corresponding to the human body joint feature point matches with the face information of the target user, wherein the determining whether to match includes, but is not limited to, matching the face position with a face feature vector, when it is detected that the corresponding gesture feature is consistent with the preset gesture feature based on the position information of the human body joint feature point, it is considered that both the face and the human body have corresponding target tracking ID values, so the face and the human body of the same person can be associated through a special association algorithm, i.e., an ID association algorithm, and since the key point of the gesture belongs to a part of the human body, the corresponding face can be associated according to the association information, to confirm that the gesture and the face are of the same person. Meanwhile, the features of the face are compared and verified with the features of the corresponding face, if the verification is passed, the user making the gesture can be determined as the target user, and the figure gesture scoring result Y is obtained₃100, otherwise Y₃＝0。

Step S33: and scoring the target video clip according to a video evaluation dimension constructed based on video quality to obtain a corresponding video quality scoring result, and constructing an evaluation label of the target video clip based on the preset video evaluation dimension and the corresponding scoring result.

Specifically, according to a video evaluation dimension constructed based on video quality, scoring the target video segment to obtain a corresponding video quality scoring result includes: based on the goalDetermining a definition grading result of the target video clip according to the size relation between the resolution of the video clip and a preset resolution threshold value and a video fuzzy grade obtained by detecting the target video clip through a video fuzzy grade detection model; determining a brightness grading result of the target video clip based on a judgment result of whether the brightness of the target video clip is within a preset brightness threshold range; detecting whether a backlight spot meeting a first preset condition exists in the target video clip through a backlight spot detection model to obtain a corresponding backlight spot detection result, and determining a backlight scoring result of the target video clip based on the backlight spot detection result; detecting a relative position relation between a portrait area in the target video clip and a video picture of the target video clip, and determining a portrait position scoring result of the target video clip based on a judgment result of whether the relative position relation meets a second preset condition; and determining a corresponding video quality grading result according to the definition grading result, the brightness grading result, the backlight grading result and the portrait position grading result. That is, the resolution of the target video segment is first determined, and if the resolution of the video segment is smaller than the preset resolution, the resolution scoring result a is obtained₄0; and if the video resolution is larger than or equal to the preset resolution, determining the definition grading result of the target video clip according to the video fuzzy grade obtained after the target video clip is detected by the video fuzzy grade detection model. Firstly, a sample consisting of a fuzzy picture and a clear picture which are subjectively classified and calibrated is used for subjectively classifying the fuzzy grade of the sample picture into 10 grades and the like, the calibrated sample picture is input into a basic network for training, a video fuzzy grade detection model is obtained through training, and the output result video fuzzy degree grade value k belongs to the field of [1,10 ] and belongs to the field of the grade value k]And k is an integer, the sharpness score result a₄10 k. Further, the brightness of the target video segment is graded, a preset brightness threshold range is set according to user requirements, the brightness mean value is judged by taking the gray mean value of the sampled pictures of the video as a standard, and when the brightness mean value is within the preset brightness threshold rangeBrightness scoring result b ₄100; when it is not in the preset brightness threshold value range, the brightness scoring result b₄0. Further, detecting the backlight spots, detecting whether the backlight spots meeting the first preset condition exist in the target video clip through the trained backlight spot detection model, and if the backlight spots meeting the first preset condition are not detected, obtaining a backlight scoring result c₄If 100, if the backlight facula meeting the first preset condition is detected, the backlight scoring result c₄0. Further, the detected position of the portrait area is scored, and a second preset condition may be set according to actual needs, where the second preset condition is an optimal imaging area, for example, as shown in fig. 6, the distances between the area boundary and the four boundaries of the picture are A, B, C, D respectively, and the size of the area boundary may be adjusted according to needs. When the portrait area meets a second preset condition, the position scoring result d of the portrait ₄100; when the portrait area does not meet the second preset condition, the position scoring result d of the portrait₄0. Further, determining a corresponding video quality scoring result according to the definition scoring result, the brightness scoring result, the backlight scoring result and the portrait position scoring result, wherein the definition scoring result is a₄The weight of the sharpness score result is alpha₄(ii) a The brightness scoring result is b₄The weight of the brightness score is beta₄(ii) a The backlight scoring result is c₄The weight of the backlight scoring result is gamma₄(ii) a The figure position scoring result is d₄The weight of the portrait position scoring result is theta₄Then the video quality score result is:

Y₄＝α₄a₄+β₄b₄+γ₄c₄+θ₄d₄；

in the formula, Y₄Representing a video quality scoring result; a is₄∈[0,100]，b₄∈[0,100]，c₄∈[0,100]，d₄∈[0,100]；α₄∈[0,1]，β₄∈[0,1]，γ₄∈[0,1]，θ₄∈[0,1]Can beIt is understood that₄、β₄、γ₄、θ₄Can be set according to actual requirements.

In some specific embodiments, a comprehensive scoring result of the target video segment is determined according to a face quality scoring result, a facial expression scoring result, a portrait gesture scoring result, and a video quality scoring result. Wherein the comprehensive scoring result is as follows:

Y＝αY₁+βY₂+γY₃+θY₄；

wherein Y represents a composite score result; alpha, beta, gamma and theta respectively represent the weight occupied by the face quality scoring result, the face expression scoring result, the portrait gesture scoring result and the video quality scoring result, and alpha, beta, gamma and theta are epsilon [0,1], and it can be understood that alpha, beta, gamma and theta can be set according to actual needs. Further, an evaluation label of the target video clip is constructed based on the comprehensive score and the comprehensive score result.

Step S34: and adding the corresponding evaluation tag to the target video clip to obtain a video material library.

Step S35: and acquiring a retrieval request which is sent by a user terminal and constructed based on a target evaluation label, and returning the target video clip which is screened from the video material library and corresponds to the target evaluation label to the user terminal.

Therefore, in the embodiment, the target video segments are scored according to the multiple dimensions of the face quality, the face expression, the face gesture and the video quality, the comprehensive scoring is further calculated, the various types of the video labels are more detailed, and meanwhile, the optimal segment can be preferably selected from a large number of target video segments in the process of searching the video materials through the comprehensive scoring result.

Referring to fig. 7, the embodiment of the present invention discloses a specific video material screening method, and compared with the previous embodiment, the present embodiment further describes and optimizes the technical solution.

Step S41: the method comprises the steps of obtaining video clips to be screened and face photos of target users, and screening out target video clips corresponding to face feature information of the face photos from all the video clips to be screened.

Step S42: according to any one or combination of several of a face quality evaluation dimension, a face expression evaluation dimension and a portrait gesture evaluation dimension, scoring the target video segments to obtain any one or combination of several of a corresponding face quality scoring result, a face expression scoring result and a portrait gesture scoring result, filtering the target video segments with scoring results lower than a corresponding preset threshold value to obtain a first filtering target video segment, and then constructing an evaluation label of the first filtering target video segment based on the evaluation dimension and the corresponding scoring result.

Step S43: according to the video evaluation dimension constructed based on the video quality, the first filtering target video clip is scored to obtain a corresponding video quality scoring result, the first filtering target video clip with the video quality scoring result lower than a preset video quality scoring threshold value is filtered to obtain a second filtering target video clip, and then an evaluation label of the second filtering target video clip is constructed based on the evaluation dimension and the corresponding scoring result.

Step S44: and adding the corresponding evaluation tag to the second filtering target video clip to obtain a video material library.

Step S45: and acquiring a retrieval request which is sent by a user terminal and constructed based on a target evaluation label, and returning the second filtered target video clip which is screened from the video material library and corresponds to the target evaluation label to the user terminal.

It is thus clear that, in this embodiment, filter the target video clip through setting up the corresponding threshold value of predetermineeing the video evaluation dimension to screen out the low-quality target video clip that grades too low, thereby reduce the calculated amount greatly, promote the screening efficiency and the quality to the video material by a wide margin, in order to select high quality video.

Referring to fig. 8, an embodiment of the present application discloses a video material screening apparatus, including:

the target video screening module 11 is configured to acquire video segments to be screened and face photos of target users, and screen out target video segments corresponding to face feature information of the face photos from all the video segments to be screened;

the video scoring module 12 is configured to score the target video clip according to a preset video evaluation dimension, and construct an evaluation label of the target video clip based on the preset video evaluation dimension and a corresponding scoring result;

a video material library establishing module 13, configured to add the corresponding evaluation tag to the target video segment to obtain a video material library;

and the video material screening module 14 is configured to acquire a retrieval request which is sent by a user terminal and constructed based on a target evaluation tag, and return the target video segment which is screened from the video material library and corresponds to the target evaluation tag to the user terminal.

Therefore, the video clips containing the target users are screened out by acquiring the face feature information of the face photos of the target users, the target video clips are scored through the preset video evaluation dimension, and the corresponding evaluation labels are constructed to establish the video material library. Therefore, before video editing is carried out, the video segments to be screened are screened through the face feature information in the face photos of the target users, so that the target video segments containing the target users can be rapidly screened before video editing; on the other hand, the target video clip is scored according to the preset video evaluation dimension, and the evaluation label and the video material library are jointly constructed according to the preset video evaluation dimension and the score, so that the target video clip can be subjected to labeling management, and subsequent video retrieval is facilitated. When a material request of a user is obtained, a corresponding target video clip is returned to the user according to the corresponding evaluation tag so as to achieve the purpose of rapidly screening the video material according to the user requirement. In conclusion, the video material can be rapidly screened according to the requirements of the user before the video clip.

In some specific embodiments, the target video screening module 11 specifically includes:

the face recognition unit is used for inputting the face picture into a trained face recognition model so as to obtain a target face feature vector corresponding to the face picture;

and the face screening unit is used for determining a face characteristic vector corresponding to each video segment to be screened by using the face recognition algorithm model and screening a target video segment corresponding to the target face characteristic vector from all the video segments to be screened.

In some embodiments, the video scoring module 12 specifically includes:

the definition scoring unit is used for determining a definition scoring result of the target video clip based on the size relationship between the resolution of the target video clip and a preset resolution threshold and a video fuzzy grade obtained after the target video clip is detected by a video fuzzy grade detection model;

the brightness scoring unit is used for determining a brightness scoring result of the target video clip based on a judgment result of whether the brightness of the target video clip is within a preset brightness threshold range;

the backlight scoring unit is used for detecting whether backlight light spots meeting a first preset condition exist in the target video clip through a backlight light spot detection model so as to obtain a corresponding backlight light spot detection result, and determining a backlight scoring result of the target video clip based on the backlight light spot detection result;

the portrait position scoring unit is used for detecting the relative position relationship between a portrait area in the target video clip and a video picture of the target video clip, and determining the portrait position scoring result of the target video clip based on the judgment result of whether the relative position relationship meets a second preset condition;

and the first grading integration unit is used for determining a corresponding video quality grading result according to the definition grading result, the brightness grading result, the backlight grading result and the portrait position grading result.

In some embodiments, the video scoring module 12 specifically includes:

the face integrity evaluation unit is used for detecting the target video segment by using a face integrity classification model corresponding to the face integrity evaluation dimension so as to determine a face integrity evaluation result of a portrait in the target video segment;

the face definition grading unit is used for detecting the target video segment by using a face definition grading model corresponding to the face definition evaluation dimension so as to determine a face definition grading result of the portrait in the target video segment;

the face size scoring unit is used for detecting the target video segment by using a face detection model to obtain a corresponding face area, determining the interpupillary distance corresponding to the face area, and scoring the target video segment according to the face size evaluation dimension based on the interpupillary distance to obtain a corresponding face size scoring result;

and the second grading integration unit is used for determining a corresponding face quality grading result according to the face integrity grading result, the face definition grading result and the face size grading result.

In some embodiments, the video scoring module 12 specifically includes:

the face characteristic point determining unit is used for determining face target characteristic points of the portrait in the target video clip and corresponding characteristic point position changes;

and the facial expression scoring unit is used for scoring the facial expressions of the figures in the target video clip based on the facial target feature points and the feature point position changes so as to obtain a corresponding facial expression scoring result.

In some embodiments, the video scoring module 12 specifically includes:

the human body characteristic point determining unit is used for acquiring human body joint characteristic points of the portrait in the target video clip;

the gesture determining unit is used for determining corresponding gesture characteristics based on the position information of the human body joint characteristic points;

the human face matching unit is used for acquiring human face information of a portrait corresponding to the human body joint characteristic point if the gesture characteristic is consistent with a preset gesture characteristic, and judging whether the human face information is matched with the human face information corresponding to the target user or not so as to obtain a corresponding matching result;

and the portrait gesture scoring unit is used for scoring the target video clip based on the matching result to obtain a corresponding portrait gesture scoring result.

Fig. 9 illustrates an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may further include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein, the memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the video material screening method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in this embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to acquire external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.

In addition, the memory 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon may include an operating system 221, a computer program 222, etc., and the storage manner may be a transient storage manner or a permanent storage manner.

The operating system 221 is used for managing and controlling each hardware device on the electronic device 20, and the computer program 222 may be Windows Server, Netware, Unix, Linux, or the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the video material screening method performed by the electronic device 20 disclosed in any of the foregoing embodiments.

Further, the present application also discloses a computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the video material screening method disclosed in the foregoing. For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, which are not described herein again.

Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The video material screening method, device, equipment and medium provided by the invention are introduced in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the above embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for screening video material, comprising:

2. The method for screening video material according to claim 1, wherein the step of screening out a target video clip corresponding to the face feature information of the face picture from all the video clips to be screened comprises:

inputting the face photo into a trained face recognition model to obtain a target face feature vector corresponding to the face photo;

3. The method for screening video material according to claim 1 or 2, wherein the scoring the target video segment according to a preset video evaluation dimension comprises:

4. The method for screening video material according to claim 3, wherein the step of scoring the target video segment according to the video evaluation dimension constructed based on the portrait characteristic information to obtain the corresponding portrait characteristic scoring result comprises:

and scoring the target video segment according to any one or combination of several of the face quality evaluation dimension, the face expression evaluation dimension and the portrait gesture evaluation dimension to obtain any one or combination of several of the corresponding face quality scoring result, the face expression scoring result and the portrait gesture scoring result.

5. The method of claim 4, wherein scoring the target video segments according to face quality assessment dimensions to obtain corresponding face quality scoring results comprises:

detecting the target video segment by using a face definition classification model corresponding to the face definition evaluation dimension so as to determine a face definition grading result of the portrait in the target video segment;

detecting the target video segment by using a face detection model to obtain a corresponding face area, determining a pupil distance corresponding to the face area, and then grading the target video segment according to a face size evaluation dimension based on the pupil distance to obtain a corresponding face size grading result;

6. The method of claim 4, wherein scoring the target video segment according to facial expression evaluation dimensions to obtain a corresponding facial expression scoring result comprises:

7. The method for screening video material according to claim 4, wherein scoring the target video segment according to a portrait gesture evaluation dimension to obtain a corresponding portrait gesture scoring result comprises:

if the gesture features are consistent with preset gesture features, acquiring face information of a portrait corresponding to the human body joint feature points, and judging whether the face information is matched with face information corresponding to the target user to obtain a corresponding matching result;

8. The method for screening video material according to claim 3, wherein scoring the target video segments according to a video evaluation dimension constructed based on video quality to obtain a corresponding video quality scoring result comprises:

determining a definition grading result of the target video clip based on the size relation between the resolution of the target video clip and a preset resolution threshold value and a video fuzzy grade obtained after the target video clip is detected by a video fuzzy grade detection model;

and determining a corresponding video quality grading result according to the definition grading result, the brightness grading result, the backlight grading result and the portrait position grading result.

9. A video material screening apparatus, comprising:

the video scoring module is used for scoring the target video clip according to a preset video evaluation dimension and constructing an evaluation label of the target video clip based on the preset video evaluation dimension and a corresponding scoring result;

10. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to carry out the steps of the video material screening method according to any one of claims 1 to 8.

11. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the video material screening method of any one of claims 1 to 8.