CN116385944A

CN116385944A - Image frame selection method, device, electronic equipment and storage medium

Info

Publication number: CN116385944A
Application number: CN202310494494.2A
Authority: CN
Inventors: 黄文举
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-07-04

Abstract

The invention provides an image frame selection method, an image frame selection device, electronic equipment and a storage medium, and relates to the technical field of image processing. Comprising the following steps: acquiring a plurality of image frames; performing global feature analysis according to the plurality of image frames to obtain global scores of each image frame in the plurality of image frames; carrying out local feature analysis according to the face area and the human body area in each first image frame to obtain local scores of each image frame; performing fusion processing according to the global score and the local score of each image frame to obtain a comprehensive score of each image frame; a target image frame is determined from the plurality of image frames based on the composite score for each image frame. The comprehensive score is obtained based on the global score and the local score, so that the comprehensive score of each image frame is more accurate, the comprehensive score is corrected, the target image frame is determined based on the corrected comprehensive score, the image frame can be de-duplicated, the target image frame with the best effect is conveniently determined, and the user experience is improved.

Description

Image frame selection method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image frame selection method, an image frame selection device, an electronic device, and a storage medium.

Background

The number of video users is large, and the beautiful moment of the users can be better recorded by intercepting some wonderful moments in the video. The favorite pictures of the user are automatically found out from a large number of pictures, the workload of manually selecting the image frames by the user is reduced, the occupation of the storage space is reduced, and the hot spot requirement in the current shooting application is met.

In the related art, a quality evaluation network is used to predict the sharpness of each frame of image in a photographed video, and then one or more frames of images with the sharpness are selected as automatically selected image frames.

However, in the related art, it is difficult to select an image frame having the best effect based on the sharpness of an image, and user experience is degraded.

Disclosure of Invention

The present invention is directed to providing an image frame selecting method, apparatus, electronic device and storage medium, so as to solve the above-mentioned problems in the related art.

In order to achieve the above purpose, the technical scheme adopted by the embodiment of the invention is as follows:

In a first aspect, an embodiment of the present invention provides an image frame selection method, including:

a plurality of image frames are acquired, and at least one first image frame of the plurality of image frames comprises: a human face area and a human body area;

performing global feature analysis according to the plurality of image frames to obtain global scores of each image frame in the plurality of image frames;

carrying out local feature analysis according to the face area and the human body area in each first image frame to obtain local scores of each image frame;

performing fusion processing according to the global scores and the local scores of each image frame to obtain a comprehensive score of each image frame;

and determining a target image frame from the plurality of image frames according to the integrated score of each image frame.

Optionally, the performing local feature analysis according to the face area and the body area in each first image frame to obtain a local score of each image frame includes:

calculating the five sense organs score of each image frame according to the facial area in each first image frame;

calculating the gesture score of each image frame according to the human body region in each first image frame; wherein the local scoring of each image frame comprises: the five sense organs score and the pose score of each image frame.

Optionally, the calculating the five sense organs score of each image frame according to the face area in each first image frame includes:

according to the facial area in each first image frame, calculating the state, expression state and color value evaluation parameters of the preset characteristic part in each first image frame;

calculating the five sense organs score of each first image frame according to the state, expression state and color value evaluation parameters of the preset feature part in each first image frame;

calculating the five sense organ scores of the second image frames in the plurality of image frames according to the five sense organ scores of each first image frame;

the second image frame is an image frame except the first image frame in the plurality of image frames, and the five sense organs score of each image frame includes the five sense organs scores of each first image frame and the second image frame.

Optionally, the calculating, according to the face area in each first image frame, the state, the expression state and the color value evaluation parameter of the preset feature part in each first image frame includes:

calculating the size of a preset characteristic part according to the face key points in the face area to obtain the state of the preset characteristic part in each first image frame;

Identifying the facial area by adopting a preset expression identification model to obtain the expression state of each first image frame;

and identifying the facial area by adopting a preset color value evaluation model to obtain the color value evaluation parameters of each first image frame.

Optionally, the calculating the pose score of each image frame according to the human body region in each first image frame includes:

performing key point detection on the human body region in each first image frame, and generating a key point heat map of each first image frame;

detecting the key point heat map of each first image frame by adopting a preset gesture scoring model to obtain gesture scores of each first image frame;

calculating the attitude scores of the second image frames in the plurality of image frames according to the attitude scores of the first image frames;

the second image frame is an image frame except the first image frame in the plurality of image frames, and the gesture score of each image frame comprises the gesture scores of each first image frame and the second image frame.

Optionally, the performing global feature analysis according to the plurality of image frames to obtain a global score of each image frame in the plurality of image frames includes:

Performing global feature analysis on the plurality of image frames by adopting a preset aesthetic scoring model to obtain aesthetic evaluation parameters of the plurality of image frames;

performing global feature analysis on the plurality of image frames by adopting a preset quality scoring model to obtain quality evaluation parameters of the plurality of image frames;

the global scoring of each image frame of the plurality of image frames includes: aesthetic evaluation parameters and quality evaluation parameters of the plurality of image frames.

Optionally, the determining the target image frame from the plurality of image frames according to the composite score of each image frame includes:

correcting the comprehensive score to obtain a corrected comprehensive score;

sequencing the plurality of image frames according to the corrected comprehensive scores to obtain a first sequencing result;

the target image frame is determined from the plurality of image frames according to the first ordering result.

Optionally, the correcting the composite score to obtain a corrected composite score includes:

sequencing the plurality of image frames according to the comprehensive score of each image frame to obtain a second sequencing result;

calculating the similarity between every two image frames in the plurality of image frames;

And correcting the comprehensive score according to the second sorting result and the similarity to obtain the corrected comprehensive score.

In a second aspect, an embodiment of the present invention provides an image frame selecting apparatus, including:

an acquisition module, configured to acquire a plurality of image frames, where at least one first image frame of the plurality of image frames includes: a human face area and a human body area;

the analysis module is used for carrying out global feature analysis according to the plurality of image frames to obtain global scores of each image frame in the plurality of image frames; carrying out local feature analysis according to the face area and the human body area in each first image frame to obtain local scores of each image frame;

the processing module is used for carrying out fusion processing according to the global scores and the local scores of each image frame to obtain the comprehensive scores of each image frame;

and the determining module is used for determining a target image frame from the plurality of image frames according to the comprehensive score of each image frame.

Optionally, the analysis module is specifically configured to calculate a five sense organ score of each of the first image frames according to the face area of the face in each of the first image frames; calculating the gesture score of each image frame according to the human body region in each first image frame; wherein the local scoring of each image frame comprises: the five sense organs score and the pose score of each image frame.

Optionally, the analysis module is specifically configured to calculate, according to the face area in each first image frame, a state, an expression state, and a color value evaluation parameter of a preset feature part in each first image frame; calculating the five sense organs score of each first image frame according to the state, expression state and color value evaluation parameters of the preset feature part in each first image frame; calculating the five sense organ scores of the second image frames in the plurality of image frames according to the five sense organ scores of each first image frame; the second image frame is an image frame except the first image frame in the plurality of image frames, and the five sense organs score of each image frame includes the five sense organs scores of each first image frame and the second image frame.

Optionally, the analysis module is specifically configured to calculate a size of a preset feature according to a face key point in the face area, so as to obtain a state of the preset feature in each first image frame; identifying the facial area by adopting a preset expression identification model to obtain the expression state of each first image frame; and identifying the facial area by adopting a preset color value evaluation model to obtain the color value evaluation parameters of each first image frame.

Optionally, the analysis module is specifically configured to perform keypoint detection on the human body region in each first image frame, and generate a keypoint heat map of each first image frame; detecting the key point heat map of each first image frame by adopting a preset gesture scoring model to obtain gesture scores of each first image frame; calculating the attitude scores of the second image frames in the plurality of image frames according to the attitude scores of the first image frames; the second image frame is an image frame except the first image frame in the plurality of image frames, and the gesture score of each image frame comprises the gesture scores of each first image frame and the second image frame.

Optionally, the analysis module is specifically configured to perform global feature analysis on the plurality of image frames by using a preset aesthetic scoring model to obtain aesthetic evaluation parameters of the plurality of image frames; performing global feature analysis on the plurality of image frames by adopting a preset quality scoring model to obtain quality evaluation parameters of the plurality of image frames; the global scoring of each image frame of the plurality of image frames includes: aesthetic evaluation parameters and quality evaluation parameters of the plurality of image frames.

Optionally, the determining module is specifically configured to correct the composite score to obtain a corrected composite score; sequencing the plurality of image frames according to the corrected comprehensive scores to obtain a first sequencing result; the target image frame is determined from the plurality of image frames according to the first ordering result.

Optionally, the determining module is specifically configured to sort the plurality of image frames according to the composite score of each image frame to obtain a second sorting result; calculating the similarity between every two image frames in the plurality of image frames; and correcting the comprehensive score according to the second sorting result and the similarity to obtain the corrected comprehensive score.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory storing a computer program executable by the processor, and a processor implementing the image frame selection method according to any one of the above first aspects when the processor executes the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when read and executed, implements the image frame selection method according to any one of the first aspects.

The beneficial effects of the invention are as follows: the embodiment of the invention provides an image frame selection method, which comprises the following steps: acquiring a plurality of image frames, wherein at least one first image frame of the plurality of image frames comprises: a human face area and a human body area; performing global feature analysis according to the plurality of image frames to obtain global scores of each image frame in the plurality of image frames; carrying out local feature analysis according to the face area and the human body area in each first image frame to obtain local scores of each image frame; performing fusion processing according to the global score and the local score of each image frame to obtain a comprehensive score of each image frame; a target image frame is determined from the plurality of image frames based on the composite score for each image frame. The comprehensive score of each image frame is obtained based on the global score and the local scores of each image frame, so that the comprehensive score of each image frame is more accurate, the presentation effect of the image frame can be represented, and furthermore, the target image frame with the best effect can be determined from a plurality of image frames according to the comprehensive score of each image frame, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart illustrating a method for selecting an image frame according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a second image frame selection method according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a method for selecting an image frame according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a method for selecting an image frame according to an embodiment of the present invention;

fig. 5 is a flowchart of a method for selecting an image frame according to an embodiment of the present invention;

fig. 6 is a flowchart illustrating a method for selecting an image frame according to an embodiment of the present invention;

fig. 7 is a flowchart of a method for selecting an image frame according to an embodiment of the present invention;

fig. 8 is a flowchart eighth of an image frame selection method according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an image frame selecting device according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention.

Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In the description of the present application, it should be noted that, if the terms "upper", "lower", and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or an azimuth or the positional relationship that is commonly put when the product of the application is used, it is merely for convenience of description and simplification of the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and therefore should not be construed as limiting the present application.

Furthermore, the terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, without conflict, features in embodiments of the present application may be combined with each other.

In the related art, a quality evaluation network is used to predict the sharpness of each frame of image in a photographed video, and then one or more frames of images with the sharpness are selected as automatically selected image frames. However, in the related art, it is difficult to select an image frame having the best effect based on the sharpness of an image, and user experience is degraded.

Aiming at the technical problems in the related art, the embodiment of the application provides an image frame selection method, which is used for respectively carrying out global feature analysis and local feature analysis, and carrying out global scoring of each image frame and local scoring of each image frame; the comprehensive score of each image frame is obtained based on the global score and the local scores of each image frame, so that the comprehensive score of each image frame is more accurate, the presentation effect of the image frame can be represented, and furthermore, the target image frame with the best effect can be determined from a plurality of image frames according to the comprehensive score of each image frame, and the user experience is improved.

The image frame selection method provided by the embodiment of the application is applied to electronic equipment, the electronic equipment can be terminal equipment, and the terminal equipment can be any one of the following: desktop computers, notebook computers, tablet computers, smart phones, etc., to which embodiments of the present application are not particularly limited.

An explanation is provided below for an image frame selection method according to an embodiment of the present application.

Fig. 1 is a schematic flow chart of an image frame selection method according to an embodiment of the present invention, as shown in fig. 1, the method may include:

s101, acquiring a plurality of image frames.

Wherein at least one first image frame of the plurality of image frames comprises: a face area and a body area.

It should be noted that, the at least one first image frame is a part of image frames in the plurality of image frames, and the plurality of image frames may further include: a second image frame not including a face area and a body area; of course, the at least one first image frame may be all of the plurality of image frames, that is, each of the plurality of image frames has a face area and a body area, which is not particularly limited in the embodiment of the present application.

In the embodiment of the application, a target video may be acquired, where the target video includes a plurality of image frames with a sequence. The target video may be a video shot by a user by using a terminal device, may be a video stored in the terminal device in advance, or may be a video acquired by using other modes, which is not particularly limited in the embodiment of the present application.

S102, global feature analysis is carried out according to the plurality of image frames, and global scores of each image frame in the plurality of image frames are obtained.

In some embodiments, a global feature analysis of at least one first dimension is performed on a plurality of image frames resulting in a global score of at least one first dimension for each of the plurality of image frames.

Illustratively, the at least one first dimension includes at least one of: aesthetic dimensions, quality dimensions, etc., although this is only an example, the first dimension may also be other dimensions where global feature analysis may be performed.

It should be noted that, the global feature analysis may be performed according to a plurality of image frames by using a preset model, the global feature analysis may be performed according to a plurality of image frames by using a preset algorithm or a preset rule, and the global feature analysis may be performed according to a plurality of image frames by using other modes, which is not limited in this embodiment of the present application.

S103, carrying out local feature analysis according to the face area and the human body area in each first image frame to obtain local scores of each image frame.

Wherein the local scoring of each image frame comprises: local scoring for facial areas of the face, and local scoring for body areas.

In addition, the local feature analysis can be performed according to the facial area of the face first, and then the local feature analysis can be performed according to the human body area; the local feature analysis can be performed according to the human body region, then the local feature analysis can be performed according to the human face region, and the local score of each image frame can be obtained according to the local feature analysis can be performed according to the human face region and the human body region.

In the embodiment of the present application, the face area refers to an area where the face of the person is located in the first image frame, and the body area refers to an area where the person is located in the first image frame.

It should be noted that the process of S103 may be performed by using a preset model, the process of S103 may be performed by using a preset algorithm or a preset rule, and the process of S103 may be performed by other methods, which is not specifically limited in the embodiment of the present application.

It should be noted that the process of S102 may be performed first and then the process of S103 may be performed first, the process of S103 may be performed first and then the process of S102 may be performed, and the process of S102 and the process of S103 may be performed simultaneously, which is not particularly limited in the embodiment of the present application.

And S104, carrying out fusion processing according to the global score and the local score of each image frame to obtain the comprehensive score of each image frame.

In some embodiments, the linear fusion processing is performed according to the local score of each image frame for the facial area of the face, the local score for the human body area and the global score of at least one first dimension, so as to obtain the comprehensive score of each image frame.

S105, determining a target image frame from a plurality of image frames according to the comprehensive score of each image frame.

Wherein the number of target image frames may be at least one.

In addition, according to the composite score of each image frame, a preset number of image frames with high composite score are selected from a plurality of image frames as target image frames, or image frames with composite score greater than or equal to preset score in the plurality of image frames are selected as target image frames.

In practical application, the target image frame can be displayed to the user, and is the image frame with the best effect in the plurality of image frames, so that the image frame with the best effect is selected from the plurality of image frames, the automatic selection of the image frame is more accurate, and the user experience is improved.

In summary, an embodiment of the present invention provides an image frame selection method, including: acquiring a plurality of image frames, wherein at least one first image frame of the plurality of image frames comprises: a human face area and a human body area; performing global feature analysis according to the plurality of image frames to obtain global scores of each image frame in the plurality of image frames; carrying out local feature analysis according to the face area and the human body area in each first image frame to obtain local scores of each image frame; performing fusion processing according to the global score and the local score of each image frame to obtain a comprehensive score of each image frame; a target image frame is determined from the plurality of image frames based on the composite score for each image frame. The comprehensive score of each image frame is obtained based on the global score and the local scores of each image frame, so that the comprehensive score of each image frame is more accurate, the presentation effect of the image frame can be represented, and furthermore, the target image frame with the best effect can be determined from a plurality of image frames according to the comprehensive score of each image frame, and the user experience is improved.

Optionally, fig. 2 is a second flowchart of an image frame selection method according to the embodiment of the present invention, as shown in fig. 2, the step of performing local feature analysis according to a face area and a body area in each first image frame in S103 to obtain a local score of each image frame may include:

s201, calculating the five sense organs score of each image frame according to the face area in each first image frame.

In some embodiments, at least one type of facial feature of each image frame is calculated from facial regions in each first image frame, and a facial feature score of each image frame is calculated from the at least one type of facial feature of each image frame.

S202, calculating the gesture score of each image frame according to the human body area in each first image frame.

Wherein the local scoring of each image frame comprises: five sense scores and pose scores for each image frame.

In this embodiment of the present application, a preset gesture scoring model, or a preset gesture scoring algorithm may be adopted, and analysis is performed according to the human gesture and motion of the human body region in each first image frame, so as to calculate a gesture score of each image frame.

In addition, the gesture score is used for representing the aesthetic degree of human gesture and action in the human body area, and is an important factor for aesthetic evaluation of the portrait image.

Optionally, fig. 3 is a flowchart of a method for selecting an image frame according to an embodiment of the present invention, as shown in fig. 3, a process for calculating a five-element score of each image frame according to a face area in each first image frame in S201 may include:

s301, calculating the state, expression state and color value evaluation parameters of preset characteristic parts in each first image frame according to the face area in each first image frame.

And carrying out face detection on each image frame in the plurality of image frames, wherein the image frame is a first image frame if a face exists, and is a second image frame if no face exists.

In some embodiments, the state of the preset feature is an eye state, five sense organs of the face area in each first image frame are analyzed, the eye state is determined to be an open eye state, a semi-open eye state, a closed eye state, a blink state or a blink state (wink, blink), the expression state is determined to be an unnatural state, a normal state or an open heart state, and the color value evaluation parameter is determined to be one parameter in a preset parameter range.

S302, calculating the five sense organs score of each first image frame according to the state, the expression state and the color value evaluation parameters of the preset feature part in each first image frame.

In the embodiment of the present application, the weight of the preset feature is determined according to the state of the preset feature, the expression weight is determined according to the expression state, the Yan Zhi score is determined according to the color value evaluation parameter, and the five-element score of each first image frame is calculated according to the weight, the expression weight and the color value score of the preset feature by adopting a preset formula. The preset feature part may be an eye, or may be a mouth, which is not particularly limited in the embodiment of the present application.

The preset formula may be: facial score = Yan Zhi score eye weight expression weight.

The score of the color value may be a value between 1 and 100, or a value between 1 and 10, or a value within a preset score range of the color value, which is not specifically limited in the embodiment of the present application.

S303, calculating the five sense organs scores of the second image frames in the plurality of image frames according to the five sense organs scores of each first image frame.

The second image frame is an image frame except the first image frame in the plurality of image frames, and the five sense organ score of each image frame comprises the five sense organ score of each first image frame and the five sense organ score of the second image frame.

It should be noted that, the average value of the five sense organs scores of the plurality of first image frames may be used as the five sense organs score of the second image frame of the plurality of image frames, where the second image frame is an image frame that does not include the facial area.

Fig. 4 is a flowchart of a method for selecting an image frame according to an embodiment of the present invention, as shown in fig. 4, a process for calculating a state, an expression state, and a color value evaluation parameter of a preset feature in each first image frame according to a face area in each first image frame may include:

s401, calculating the size of the preset feature part according to the face key points in the face area, and obtaining the state of the preset feature part in each first image frame.

The predetermined feature may be an eye.

In some embodiments, the size of the eyes is calculated according to the face key points in the face area, and the eye state in each first image frame is obtained, where the eye state may be any one of the following: an open eye state, a semi-open eye state, a closed eye state, a blink state, or a blink one eye state.

S402, recognizing the facial area of the face by adopting a preset expression recognition model to obtain the expression state of each first image frame.

Wherein the expression state is any one of the following: unnatural, normal, happy. The preset expression recognition model is a pre-trained model.

S403, identifying the facial area by adopting a preset color value evaluation model to obtain the color value evaluation parameters of each first image frame.

It should be noted that the color value evaluation parameter may be a color value score, the Yan Zhi score is a value within a preset color value score range, and the preset color value score range may be set according to actual requirements.

Optionally, fig. 5 is a flowchart of a method for selecting image frames according to an embodiment of the present invention, as shown in fig. 5, a process for calculating a pose score of each image frame according to a human body region in each first image frame in S202 may include:

s501, detecting key points of human body areas in each first image frame, and generating a key point heat map of each first image frame.

And detecting human body for each image frame in the plurality of image frames, wherein the image frame is a first image frame if a human body exists, and the image frame is a second image frame if a human body does not exist.

In addition, HRNet (High-Resolution Net) may be used to detect key points of the human body region in each first image frame, the key point heat map of each first image frame may be a human skeleton key point heat map of each first image frame, and the human skeleton key points may reduce interference of clothing, makeup, image background, and the like of the person compared with the first image frames.

S502, detecting a key point heat map of each first image frame by adopting a preset gesture scoring model to obtain gesture scores of each first image frame.

The key point heat map of each first image frame is input into a preset gesture scoring model, and the preset gesture scoring model can output gesture scores of each first image frame. The preset gesture scoring model is a pre-trained model.

The training process of the preset gesture scoring model is as follows:

in order to reduce the interference of factors such as image background, character clothing, dressing and the like on network training, the embodiment of the application takes a sample human skeleton key point heat map as input, randomly selects a picture from the image set of the attractive photographing posture, marks the picture as a, randomly selects a picture from the image set of the unaware photographing posture, marks the picture as a picture B, and corresponds to the mark as Lb.

Performing key point detection on the pictures A and B to obtain a sample human skeleton key point heat map A and a sample human skeleton key point heat map B, inputting the sample human skeleton key point heat map A and the sample human skeleton key point heat map B into PoseNet (which is a real-time gesture detection technology) to obtain evaluation scores Sa and Sb, and finally iteratively optimizing a preset loss function to train the PoseNet, wherein the preset loss function L is as follows:

Wherein, M may be set according to actual requirements, and for example, M may be set to 2.

In practical application, HRNet can be used for detecting key points of the picture a and the picture B, mobilenet v2 (which is a lightweight deep neural network) with the number of channels being 0.5 times is used as a backbone network of the poisenet, and correspondingly, a preset gesture scoring model can be the trained mobilenet v2.

S503, calculating the attitude scores of the second image frames in the plurality of image frames according to the attitude scores of each first image frame.

The second image frame is an image frame except the first image frame in the plurality of image frames, and the gesture score of each image frame comprises the gesture score of each first image frame and the gesture score of the second image frame.

In some embodiments, a mean of the pose scores of the plurality of first image frames is calculated and used as a pose score for a second image frame, the second image frame being an image frame that does not include a human body region.

In an embodiment of the present application, the local scoring of each of the plurality of image frames includes: five sense scores and pose scores for each image frame.

Optionally, fig. 6 is a flowchart of a method for selecting an image frame according to an embodiment of the present invention, as shown in fig. 6, performing global feature analysis according to a plurality of image frames to obtain a global score of each image frame in the plurality of image frames, where the method includes:

S601, performing global feature analysis on a plurality of image frames by adopting a preset aesthetic scoring model to obtain aesthetic evaluation parameters of the plurality of image frames.

Wherein the aesthetic evaluation parameter may also be referred to as an aesthetic score, which evaluates the image frame primarily from multiple dimensions of image brightness, contrast, hue, scene composition, etc.

In some embodiments, the preset aesthetic scoring model may also be referred to as ANet, the preset aesthetic scoring model is a pre-trained model, and the training process of the preset aesthetic scoring model is as follows:

the designer scores the image set in two categories of good and bad to obtain a two-category data set, comprising: the method comprises the steps that an aesthetic evaluation picture set and an aesthetic evaluation picture set are selected randomly, one piece of the aesthetic evaluation picture set is marked as a picture A, and the corresponding marking score is La; randomly selecting one from the data set with poor aesthetic evaluation, and recording as a picture B. The corresponding label score is Lb. Then, the pictures a and B are input into ANet, respectively, to obtain evaluation scores Sa and Sb. Finally, iteratively optimizing a preset loss function, and training the ANet, wherein the preset loss function can be:

Where M may be 2, and MobileNetV2 with 0.5 channel number may be used as the backbone network for ANet.

S602, performing global feature analysis on a plurality of image frames by adopting a preset quality scoring model to obtain quality evaluation parameters of the plurality of image frames.

The quality evaluation parameter may also be called a quality score, and the quality score is mainly used for evaluating whether the picture has noise, blurring, artifacts, and the like.

In some embodiments, the preset quality score model may also be referred to as QNet, where the preset quality score model is a pre-trained model, and the training process of the preset quality score model is as follows:

firstly, collecting a batch of clear pictures as positive samples, wherein the positive samples are good-quality samples, and then carrying out degradation treatment such as Gaussian blur, noise addition, compression, contrast change and the like on the positive samples to obtain negative samples, wherein the negative samples are bad-quality samples. The training process of QNet is similar to ANet.

Firstly, randomly selecting one picture from a good-looking picture set, and marking the picture as a corresponding mark score of La; randomly selecting one from the bad picture set and recording as a picture B. The corresponding label score is Lb. Then, the pictures a and B are input into QNet, respectively, to obtain evaluation scores Sa and Sb. And finally, presetting a loss function by an iterative optimization formula, and training the QNet.

Wherein, M of the loss function is set to 2, and MobileNet V2 with the channel number of 0.5 times is used as a backbone network of QNet.

Illustratively, the global score for each of the plurality of image frames includes: aesthetic evaluation parameters and quality evaluation parameters for a plurality of image frames.

Optionally, fig. 7 is a flowchart of a method for selecting an image frame according to an embodiment of the present invention, as shown in fig. 7, a process for determining a target image frame from a plurality of image frames according to a composite score of each image frame in S105 may include:

and S701, correcting the comprehensive score to obtain a corrected comprehensive score.

S702, sorting the plurality of image frames according to the corrected comprehensive scores to obtain a first sorting result.

In some embodiments, the plurality of image frames are ranked from high to low according to the corrected composite score, resulting in a first ranking result.

S703, determining a target image frame from the plurality of image frames according to the first sorting result.

In the embodiment of the application, a preset number of image frames which are ranked ahead in the first ranking result are used as target image frames; or, according to the corrected composite score of each image frame, image frames with the corrected composite score greater than or equal to the preset score in the plurality of image frames are used as target image frames, which is not particularly limited in the embodiment of the present application.

Optionally, fig. 8 is a schematic flow diagram eight of an image frame selection method according to an embodiment of the present invention, as shown in fig. 8, a process of correcting a composite score in S701 to obtain a corrected composite score may include:

s801, sorting the plurality of image frames according to the comprehensive score of each image frame to obtain a second sorting result.

The second ordering result includes a plurality of image frames f= { F1, F2, & gt fn } arranged from top to bottom according to the synthesis score, and the corresponding synthesis score is s= { S1, S2, & gt sn }.

S802, calculating the similarity between every two image frames in the plurality of image frames.

In one possible implementation, a pre-trained 0.5 channel number mobilenet v2 in an ImageNet (a computer vision system recognition item) data set is used as a feature extractor, a final layer of convolution features are extracted, global maximum pooling is performed on each channel of the features to obtain feature codes of each image frame, and finally cosine distances of codes between the image frame to be detected and other image frames (the other image frames are image frames sequenced before the image frame to be detected in the second sequencing result) are calculated as similarity between the image frame to be detected and the other image frames, and similarly, similarity between every two image frames can be obtained.

And S803, correcting the comprehensive score according to the second sorting result and the similarity to obtain a corrected comprehensive score.

In some embodiments, for image frame fi (fi e F) to be detected, which is any one of a plurality of image frames, the similarity { d ] between the image frame fi-1 and the image frames F1, F2 is calculated respectively ¹ _i ,d ² _i ,...d ^i-1 _i Get { d }, take ¹ _i, d ² _i ,...d ^i-1 _i The maximum value in } is taken as the similarity coefficient di=max (d ¹ _i ，d ² _i ，...d ^i-1 _i ) Correcting the comprehensive score si of fi according to the similarity coefficient of fi to obtainsi', wherein the integrated score correction formula may be: s is(s) _i ’＝s _i *(k*di+b)。

Wherein k is-2, b is 2.84, and the values of k and b can be set according to actual requirements, which are not particularly limited in the embodiment of the present application.

In this embodiment of the present application, an image frame having the highest similarity with an image frame to be detected may be referred to as a similar image frame of the image frame to be detected, and if the ranking of the similar image frame in the second ranking result is higher than that of the image frame to be detected, the comprehensive score of the image frame to be detected is corrected by adopting the above comprehensive score correction formula, so that the comprehensive score of the image frame to be detected is reduced, the ranking of the image frame to be detected is moved backward, and the plurality of image frames are ranked again based on the corrected comprehensive score, so as to obtain the first ranking result.

It should be noted that, the similarity between every two image frames in the plurality of image frames is calculated, and the comprehensive score is corrected according to the second sorting result and the similarity, so as to obtain the corrected comprehensive score, thereby achieving the purpose of de-duplication and avoiding the repeated image frames contained in the target image frame determined in the plurality of image frames.

In summary, the overall score of each image frame is obtained based on the scores of the global score and the local score of each image frame, so that the overall score of each image frame is more accurate, the presentation effect of the image frame can be represented, and furthermore, the target image frame with the best effect can be determined from a plurality of image frames according to the overall score of each image frame, thereby improving the user experience. Global scoring includes: aesthetic scoring and quality scoring, local scoring including: the five sense organs score and the gesture score, the image frames are evaluated in all directions based on multiple scores, the obtained comprehensive score is more accurate, the feature codes of the pictures are extracted by using a pre-training network, cosine distances of the feature codes of different pictures are calculated, the similarity between the two pictures is obtained, and finally the comprehensive score is corrected according to the similarity, so that the purpose of de-duplication is achieved.

In addition, the image frame selection method provided by the embodiment of the invention can be used for selecting frames in real time for a client user, so that the most attractive and attractive images can be selected from videos shot in real time or imported in later period, the workload of manual screening of the user and the occupation of storage space are reduced, and the user is surprised.

The following describes an image frame selecting device, an electronic device, a storage medium, etc. for executing the image frame selecting method provided in the present application, and specific implementation processes and technical effects thereof refer to relevant contents of the image frame selecting method, which are not described in detail below.

Fig. 9 is a schematic structural diagram of an image frame selecting device according to an embodiment of the present invention, as shown in fig. 9, where the device includes:

an acquisition module 901, configured to acquire a plurality of image frames, where at least one first image frame of the plurality of image frames includes: a human face area and a human body area;

an analysis module 902, configured to perform global feature analysis according to the plurality of image frames, to obtain a global score of each image frame in the plurality of image frames; carrying out local feature analysis according to the face area and the human body area in each first image frame to obtain local scores of each image frame;

The processing module 903 is configured to perform fusion processing according to the global score and the local score of each image frame, so as to obtain a comprehensive score of each image frame;

a determining module 904, configured to determine a target image frame from the plurality of image frames according to the composite score of each image frame.

Optionally, the analyzing module 902 is specifically configured to calculate a five sense organ score of each of the first image frames according to the face area of the face of the person in each of the first image frames; calculating the gesture score of each image frame according to the human body region in each first image frame; wherein the local scoring of each image frame comprises: the five sense organs score and the pose score of each image frame.

Optionally, the analyzing module 902 is specifically configured to calculate, according to the face area in each first image frame, a state, an expression state, and a color value evaluation parameter of a preset feature part in each first image frame; calculating the five sense organs score of each first image frame according to the state, expression state and color value evaluation parameters of the preset feature part in each first image frame; calculating the five sense organ scores of the second image frames in the plurality of image frames according to the five sense organ scores of each first image frame; the second image frame is an image frame except the first image frame in the plurality of image frames, and the five sense organs score of each image frame includes the five sense organs scores of each first image frame and the second image frame.

Optionally, the analyzing module 902 is specifically configured to calculate a size of a preset feature according to a face key point in the face area, so as to obtain a state of the preset feature in each first image frame; identifying the facial area by adopting a preset expression identification model to obtain the expression state of each first image frame; and identifying the facial area by adopting a preset color value evaluation model to obtain the color value evaluation parameters of each first image frame.

Optionally, the analyzing module 902 is specifically configured to perform keypoint detection on the human body area in each first image frame, and generate a keypoint heat map of each first image frame; detecting the key point heat map of each first image frame by adopting a preset gesture scoring model to obtain gesture scores of each first image frame; calculating the attitude scores of the second image frames in the plurality of image frames according to the attitude scores of the first image frames; the second image frame is an image frame except the first image frame in the plurality of image frames, and the gesture score of each image frame comprises the gesture scores of each first image frame and the second image frame.

Optionally, the analyzing module 902 is specifically configured to perform global feature analysis on the plurality of image frames by using a preset aesthetic scoring model to obtain aesthetic evaluation parameters of the plurality of image frames; performing global feature analysis on the plurality of image frames by adopting a preset quality scoring model to obtain quality evaluation parameters of the plurality of image frames; the global scoring of each image frame of the plurality of image frames includes: aesthetic evaluation parameters and quality evaluation parameters of the plurality of image frames.

Optionally, the determining module 904 is specifically configured to correct the composite score to obtain a corrected composite score; sequencing the plurality of image frames according to the corrected comprehensive scores to obtain a first sequencing result; the target image frame is determined from the plurality of image frames according to the first ordering result.

Optionally, the determining module 904 is specifically configured to sort the plurality of image frames according to the composite score of each image frame to obtain a second sorting result; calculating the similarity between every two image frames in the plurality of image frames; and correcting the comprehensive score according to the second sorting result and the similarity to obtain the corrected comprehensive score.

The foregoing apparatus is used for executing the method provided in the foregoing embodiment, and its implementation principle and technical effects are similar, and are not described herein again.

The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (digital singnal processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 10, where the electronic device includes: a processor 1001 and a memory 1002.

The memory 1002 is used to store a program, and the processor 1001 calls the program stored in the memory 1002 to execute the above-described method embodiment. The specific implementation manner and the technical effect are similar, and are not repeated here.

Optionally, the present invention also provides a program product, such as a computer readable storage medium, comprising a program for performing the above-described method embodiments when being executed by a processor.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the invention. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image frame selection method, the method comprising:

2. The method of claim 1, wherein the performing the local feature analysis based on the face region and the body region in each first image frame to obtain a local score for each image frame comprises:

3. The method of claim 2, wherein said calculating a facial score for each of said first image frames based on said facial area of said each of said first image frames comprises:

the second image frame is an image frame except the first image frame in the plurality of image frames, and the five sense organs score of each image frame comprises the five sense organs scores of each first image frame and the second image frame.

4. The method according to claim 3, wherein calculating the state, expression state, and color value evaluation parameters of the preset feature in each first image frame according to the face area in each first image frame comprises:

5. The method of claim 2, wherein said calculating a pose score for each of said first image frames from said region of the human body in each of said first image frames comprises:

6. The method of claim 1, wherein the performing global feature analysis from the plurality of image frames to obtain a global score for each of the plurality of image frames comprises:

7. The method of claim 1, wherein determining the target image frame from the plurality of image frames based on the composite score for each image frame comprises:

correcting the comprehensive score to obtain a corrected comprehensive score;

8. The method of claim 7, wherein correcting the composite score to obtain a corrected composite score comprises:

9. An image frame selection apparatus, the apparatus comprising:

10. An electronic device, comprising: a memory and a processor, the memory storing a computer program executable by the processor, the processor implementing the image frame selection method of any of the preceding claims 1-8 when the computer program is executed.

11. A computer readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when read and executed, implements the image frame selection method according to any of the preceding claims 1-8.