CN110996169B

CN110996169B - Method, device, electronic equipment and computer-readable storage medium for clipping video

Info

Publication number: CN110996169B
Application number: CN201911115501.3A
Authority: CN
Inventors: 李马丁; 郑云飞; 章佳杰; 宁小东; 刘建辉; 于冰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-07-12
Filing date: 2019-11-14
Publication date: 2022-03-01
Anticipated expiration: 2039-11-14
Also published as: CN110996169A

Abstract

The present disclosure relates to a method, an apparatus, an electronic device and a computer-readable storage medium for clipping a video, the method comprising: extracting a plurality of candidate images from a target video; for each candidate image, carrying out evaluation analysis on the candidate image to obtain an evaluation score of the candidate image; and intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip. So as to optimize the video clipping effect and improve the use experience of the user.

Description

Method, device, electronic equipment and computer-readable storage medium for clipping video

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for editing a video, an electronic device, and a computer-readable storage medium.

Background

In the related art, video editing operation also depends on manual editing to a great extent, the existing manual editing needs to expand video files frame by frame and perform editing with frame as precision, but because the video data volume is very large, if each frame in a video is analyzed, the consumed manual time is very long, and therefore, the manual editing is a very tedious matter.

Meanwhile, if the video clip is to select a highlight segment in the video, the video frame with the highlight content in the video cannot be automatically judged according to the content of the video by adopting a manual clipping mode to generate the video segment. Therefore, the manual clipping mode is not intelligent enough, and when the highlight segment of the video is clipped manually, the accuracy of manually judging the highlight video frame in the video is not high, the final clipping effect is influenced, and the use experience of the user is reduced.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device and a computer-readable storage medium for editing a video, so as to solve at least the above existing technical problems. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a method of clipping a video, the method comprising:

extracting a plurality of candidate images from a target video;

for each candidate image, carrying out evaluation analysis on the candidate image to obtain an evaluation score of the candidate image;

and intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip.

Optionally, performing an intercepting operation on the target video based on the evaluation scores of the candidate images to obtain a video segment, including:

determining a target image with the highest evaluation score from the plurality of candidate images;

and taking the target image as a center, and respectively intercepting video clips with the same time length before and after the center to obtain a final video clip.

carrying out interpolation processing on the evaluation scores of the candidate images to obtain the evaluation score of each frame of video image in the target video;

setting a sliding window with the length equal to the preset video clip duration;

moving the sliding window in the target video according to a preset step length, and determining the sum of the evaluation scores of all frame video images included in the video clip in each sliding window;

and intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment.

Optionally, for each candidate image, performing evaluation analysis on the candidate image to obtain an evaluation score of the candidate image, including:

for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least one of the following: a definition dimension, a colorfulness dimension, and a significance dimension;

for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the multiple dimensions.

for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least two of the following items: a definition dimension, a colorfulness dimension, a significance dimension, at least one dimension associated with a face;

for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the candidate image in the multiple dimensions, wherein the scores of the multiple dimensions comprise at least two of the following items: image clarity score, image colorfulness score, image meaningfulness score, at least one score associated with a human face.

Optionally, the at least one score associated with the face comprises one or more of: the method comprises the following steps of face definition score, eye opening degree score, mouth opening degree score, face composition score and face direction score. According to a second aspect of embodiments of the present disclosure, there is provided an apparatus for clipping a video, the apparatus comprising:

the extraction module is used for extracting a plurality of candidate images from the target video;

the evaluation module is used for carrying out evaluation analysis on each candidate image so as to obtain an evaluation score of the candidate image;

and the intercepting module is used for intercepting the target video based on the respective evaluation scores of the candidate images to obtain a video clip.

Optionally, the intercepting module includes:

a first determination module, configured to determine a target image with a highest evaluation score from the plurality of candidate images;

and the first intercepting submodule is used for respectively intercepting the video clips with the same time length from front to back of the center by taking the target image as the center so as to obtain the final video clip.

Optionally, the intercepting module includes:

the interpolation module is used for carrying out interpolation processing on the evaluation scores of the candidate images to obtain the evaluation scores of each frame of video image in the target video;

the setting module is used for setting a sliding window with the length equal to the preset video clip duration;

the second determining module is used for moving the sliding windows in the target video according to a preset step length and determining the sum of the evaluation scores of all frame video images included in the video clips in each sliding window;

and the second intercepting submodule is used for intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment.

Optionally, the evaluation module comprises:

an analysis module configured to, for each candidate image, analyze the candidate image from a plurality of dimensions, and determine scores of the candidate image in the plurality of dimensions, where the plurality of dimensions include at least one of: a definition dimension, a colorfulness dimension, and a significance dimension;

and the evaluation score determining module is used for determining the evaluation score of each candidate image according to the scores of the candidate image in multiple dimensions and the weight of each dimension in the multiple dimensions.

Optionally, the evaluation module is configured to: for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least two of the following items: a definition dimension, a colorfulness dimension, a significance dimension, at least one dimension associated with a face; for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the candidate image in the multiple dimensions, wherein the scores of the multiple dimensions comprise at least two of the following items: image clarity score, image colorfulness score, image meaningfulness score, at least one score associated with a human face.

Optionally, the at least one score associated with the face comprises one or more of: the method comprises the following steps of face definition score, eye opening degree score, mouth opening degree score, face composition score and face direction score.

According to a third aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the method according to the first aspect of the present application.

According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect of the present application when executed.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the embodiment disclosed by the application, the video frames in the video are automatically evaluated, and the video is intercepted based on the evaluation scores to obtain the final highlight video clip. The target video is intercepted based on the evaluation scores of the candidate images, so that the obtained video clips are automatically generated according to the video content, the candidate images are partial video frames extracted from the video, evaluation and analysis on each frame of video images in the video are not needed, the time is saved, the speed of generating the video clips is high, and the quality is high.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow diagram illustrating a method of clipping a video in accordance with an exemplary embodiment;

FIG. 2 is a flow chart of a method for clipping video according to another embodiment of the present application;

FIG. 3 is a flow chart of a method for clipping video according to another embodiment of the present application;

FIG. 4 is a flow chart of a method for clipping video according to another embodiment of the present application;

fig. 5 is a schematic diagram of an apparatus for clipping a video according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Meanwhile, if the video clip is to select a highlight segment in the video, the video frame with the highlight content in the video cannot be automatically judged according to the content of the video by adopting a manual clipping mode to generate the video segment. Therefore, the manual clipping mode is not intelligent enough, and when the highlight segment of the video is clipped manually, the accuracy of manually judging the highlight video frame in the video is not high, the final clipping effect is influenced, and the use experience of the user is reduced. In order to improve the use experience of a user, the video editing method is provided, video frames in the video are automatically evaluated, and the video is intercepted based on the evaluation scores to obtain a final highlight video segment.

Fig. 1 shows a flowchart of a method for editing a video according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

step S11: extracting a plurality of candidate images from a target video;

step S12: for each candidate image, carrying out evaluation analysis on the candidate image to obtain an evaluation score of the candidate image;

step S13: and intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip.

In the present embodiment, the candidate image is a plurality of frames of video images extracted from the target video, and may be extracted at equal inter-frame intervals, for example, one frame of video image is extracted for a single target video, for example, every 10 frames of video images. Taking the example that the target video includes 90 frames of video images, the extracted frames of video images are respectively: 11 th, 22 th, 33 th, 44 th, 55 th, 66 th, 77 th and 88 th frames.

Or, in this embodiment, when extracting multiple frames of video images from the target video, the target video may be firstly divided into multiple sub-segments, and then one frame of video image may be extracted from each sub-segment. Illustratively, for a single target video, the target video is equally divided into N sub-segments, and for each sub-segment, a frame of video image is randomly extracted from the sub-segments.

In the above embodiment, by extracting multiple frames of video images at equal intervals between frames, or by dividing a video into multiple sub-segments and then extracting one frame of video image from each sub-segment, the extracted multiple frames of video images are video images uniformly distributed in a target video, and the multiple frames of video images can more accurately represent the content of the target video, thereby further improving the quality and accuracy of a video clip.

In this embodiment, for each candidate image, the candidate image is evaluated and analyzed to obtain an evaluation score of the candidate image, and then the target video is intercepted based on the respective evaluation scores of the plurality of candidate images to obtain a video clip; wherein the evaluation analysis refers to evaluation analysis of image quality of the candidate image, for example, the image quality includes but is not limited to: the definition of the image, the color richness of the image, the significance of the image, the definition of the face in the image and the like; by evaluating and analyzing the quality of the candidate images, a plurality of images with evaluation scores higher than a preset threshold value can be selected to form a final video clip, the final video clip can be intercepted according to a single image with the highest evaluation score, or a group of continuous images with the highest evaluation score can be intercepted as the final video clip.

In the above embodiment, the target video is intercepted based on the evaluation scores of the candidate images, so that the obtained video segment is automatically generated according to the video content, and the candidate images are partial video frames extracted from the video, and evaluation and analysis of each frame of video image in the video are not needed, so that time is saved, the speed of generating the video segment is high, and the quality is high.

Referring to fig. 2, fig. 2 is a flowchart of a method for clipping a video according to another embodiment of the present application. As shown in fig. 2, in addition to the steps S11-S12, the step S13 specifically includes the following steps:

step S131: determining a target image with the highest evaluation score from the plurality of candidate images;

step S132: and taking the target image as a center, and respectively intercepting video clips with the same time length before and after the center to obtain a final video clip.

In the embodiment, a target image with the highest evaluation score is determined from a plurality of candidate images, the sum of the time lengths of the final video clips (which can be: the preset time length of the video clip is obtained) is determined, and then the video clips with the same time length are respectively intercepted front and back at the center by taking the target image as the center to obtain the final video clip; illustratively, the example illustrated when equal inter-frame space extraction is employed in step S11 above is still employed: taking the example that the target video includes 90 frames of video images, the extracted frames of video images are respectively: 11 th, 22 th, 33 th, 44 th, 55 th, 66 th, 77 th and 88 th frames. If the target image with the highest evaluation score is the 55 th frame video image, and the frame video images of 30 frames are included in the preset time of the video clip required to be generated at this time, the 55 th frame video image is taken as the center, the 15 th frame video image is moved forward and the 15 th frame video image is moved backward, so that the 40 th frame video image-70 th frame video image in the target video is captured, and the video images of the 40 th frame video image-70 th frame video image are taken as the final video clip.

Referring to fig. 3, fig. 3 is a flowchart of a method for clipping a video according to another embodiment of the present application. As shown in fig. 3, in addition to the steps S11-S12, the step S13 specifically includes the following steps:

step S133: carrying out interpolation processing on the evaluation scores of the candidate images to obtain the evaluation score of each frame of video image in the target video;

step S134: setting a sliding window with the length equal to the preset video clip duration;

step S135: moving the sliding window in the target video according to a preset step length, and determining the sum of the evaluation scores of all frame video images included in the video clip in each sliding window;

step S136: and intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment.

In this embodiment, first, interpolation processing is performed on the evaluation scores of each of the plurality of candidate images, the interpolation mode adopted is the prior art, and is not repeated here, and the evaluation scores of each frame of video image in the target video are obtained by adopting the interpolation processing.

Then, setting a sliding window with the length equal to the preset video segment time length to enable the window size of the sliding window to be equal to the preset video segment time length, moving the sliding window in the target video according to a preset step length, and determining the evaluation score sum of all frame video images included in the video segment in each sliding window; for example, the preset step size of the sliding window may be equal to or different from the preset duration of the video segment, for example: the target video comprises 90 frames of video images, the preset duration of the video clip comprises 10 frames of video images, the window size of the sliding window comprises 10 frames of video images, and the preset step length of the sliding window can be 10 frames of video images, also can be 8 frames of video images, and also can be 12 frames of video images; for example: when the length of the sliding window is preset according to 10 frames of video images, the sliding windows generated when the sliding window moves in the target video are as follows: 1 st frame video image-10 th frame video image; 10 th frame video image-20 th frame video image; 20 th frame video image-30 th frame video image; frame 30 video image-frame 40 video image; frame 40 video image-frame 50 video image; frame 50 video image-frame 60 video image; frame 60 video image-frame 70 video image; frame 70 video image-frame 80 video image; frame 80 video image-frame 90 video image.

And finally, intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment. For example, taking the above example as an example, if the sum of the evaluation scores of the 1 st frame video image to the 10 th frame video image is the highest, the 1 st frame video image to the 10 th frame video image are cut into the final video segment.

Referring to fig. 4, fig. 4 is a flowchart of a method for clipping a video according to another embodiment of the present application. As shown in fig. 4, the method includes, in addition to steps S11 and S13, step S12 includes the following steps:

s121: for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least one of the following: a definition dimension, a colorfulness dimension, and a significance dimension;

s122: for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the multiple dimensions.

For each candidate image, the candidate image may be analyzed directly from multiple dimensions to determine scores for the candidate image in the multiple dimensions. For each candidate video, at least one frame of video frame may be extracted from the candidate video, and then the extracted at least one frame of video frame may be analyzed for the video frame from multiple dimensions, so as to determine scores of the candidate video in the multiple dimensions.

For example, at least two of the following sub-scores of the candidate image may be determined for the candidate image, and the score of the candidate image may be determined according to the determined sub-scores and the weights of the sub-scores.

Sub-score 1: the image clarity score. The image sharpness score may be determined by detecting edges of the image, for example using the laplacian, and then calculating the variance of the edge detection results. The larger the variance is, the more details in the image are represented, the clearer the image is, and the higher the corresponding image definition degree score is.

Subpartition 2: the image color richness score. The image colorfulness score may be determined by calculating the respective variances and means of the UV components in the YUV color space, then calculating the square root of the sum of the UV variances A and the square root of the sum of the UV means squares B, and then weighting and summing A and B.

Subpartition 3: the image meaningful degree score. In general, if an image is too simple (e.g., can be easily predicted by coding) or too complex (e.g., very difficult to predict), then the image has a high probability of being meaningless. For example, photographed white walls, floors, etc., are often meaningless simplistic images. For example, a grass, or the like is usually captured as a meaningless, excessively complex image. Therefore, a three-dimensional feature vector can be formed by calculating the variance and the average value of the distortion in the image frame and adding the color richness, and the image meaningful degree score can be determined by utilizing the existing machine learning model for classification.

Subpartition 4: a face clarity score. The edges can be detected according to the face area, and then the variance of the edge detection result is calculated, so that the image definition degree score is determined. Or the face area is subjected to fuzzy processing and then is subjected to difference comparison with the original face area, if the difference is larger, the original face is clearer, and therefore the image definition degree score is determined. The face region described here may be a frame-shaped region for face detection, a face contour region framed based on features of the face, or a face internal region formed by two eyes and a chin (or mouth).

Subpartition 5: the eye opening degree score. The eye-open degree score may be determined by calculating the ratio of the distance between the upper and lower eyelids and the distance between the inner and outer canthus of the eyes, R1 and R2, respectively, for both eyes. Wherein if R1 is close to R2, the eye opening degree of both eyes is considered to be consistent, and the sum of R1 and R2 is determined as the eye opening degree score; otherwise, 2 × max (R1, R2) is determined as the eye-open degree score, considering that an expression of opening one eye and closing one eye is possible. Furthermore, if both R1 and R2 are small, i.e., both eyes are closed, the eye-open score may be penalized, e.g., by taking a zero or negative value.

Subpartition 6: mouth opening degree score. The degree score of the open mouthed part can be determined by calculating the included angle BAC and the included angle ABC between the straight line formed by connecting the two mouthpiece angles A, B and the midpoint C of the lower lip. Wherein, the larger the angle, the higher the mouth opening degree score.

Subpartition 7: a face composition score. The face composition score can be determined by calculating the center of gravity of a polygon formed by connecting the center points of the faces and comparing the distance between the center of gravity and the ideal composition center of gravity (such as the center of a portrait image is on the top, or the left side of a landscape image is on the top, or the right side of the landscape image is on the top, etc.). Wherein the closer the distance, the higher the face composition score.

Subpartition 8: a face direction score. The face direction score is determined by calculating the direction of the face (e.g., head up/down, left/right turn, left/right head skew). For example, the head lowering angle is in a proper range, and the face direction score is large; the left/right turning direction is in a certain range, and the score of the face direction is larger; the head-bending direction is within a certain range, and the score of the face direction is larger.

It should be understood that in the case where a face is included in the candidate image, the above-described sub-clauses 4 to 8 may be determined for the candidate image.

In addition, for the candidate video, the score of the candidate video may also be adjusted according to the stability of the video, for example, whether the video is jittered, whether the scene is frequently switched, and the like.

Based on the same inventive concept, an embodiment of the present application provides an apparatus for editing video. Referring to fig. 5, fig. 5 is a schematic diagram of an apparatus for clipping a video according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:

an extracting module 501, configured to extract a plurality of candidate images from a target video;

an evaluation module 502, configured to perform evaluation analysis on each candidate image to obtain an evaluation score of the candidate image;

an intercepting module 503, configured to perform an intercepting operation on the target video based on the evaluation scores of the multiple candidate images, so as to obtain a video segment.

Optionally, the intercepting module includes:

Optionally, the evaluation module comprises:

Based on the same inventive concept, another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the method according to any of the above-mentioned embodiments of the present application.

Based on the same inventive concept, another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the electronic device implements the steps of the method according to any of the above embodiments of the present application.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method, the apparatus, the electronic device, and the computer-readable storage medium for clipping video provided by the present application are introduced in detail above, and a specific example is applied in the present application to explain the principles and embodiments of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of clipping a video, the method comprising:

extracting a plurality of candidate images from a target video;

intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip;

wherein the intercepting operation of the target video based on the respective evaluation scores of the plurality of candidate images to obtain a video clip comprises:

2. The method of claim 1, wherein performing a clipping operation on the target video based on the evaluation scores of the candidate images to obtain a video clip comprises:

3. The method of claim 1, wherein for each candidate image, performing an evaluation analysis on the candidate image to obtain an evaluation score of the candidate image comprises:

4. The method of claim 1, wherein for each candidate image, performing an evaluation analysis on the candidate image to obtain an evaluation score of the candidate image comprises:

5. The method of claim 4, wherein the at least one face-related score comprises one or more of: the method comprises the following steps of face definition score, eye opening degree score, mouth opening degree score, face composition score and face direction score.

6. An apparatus for clipping video, the apparatus comprising:

the intercepting module is used for intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip;

wherein the intercept module comprises:

7. The apparatus of claim 6, wherein the intercept module comprises:

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executed implements the steps of the method according to any of claims 1-5.