CN110996169A - Method, device, electronic equipment and computer-readable storage medium for clipping video - Google Patents

Method, device, electronic equipment and computer-readable storage medium for clipping video Download PDF

Info

Publication number
CN110996169A
CN110996169A CN201911115501.3A CN201911115501A CN110996169A CN 110996169 A CN110996169 A CN 110996169A CN 201911115501 A CN201911115501 A CN 201911115501A CN 110996169 A CN110996169 A CN 110996169A
Authority
CN
China
Prior art keywords
video
image
score
candidate
candidate image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911115501.3A
Other languages
Chinese (zh)
Other versions
CN110996169B (en
Inventor
李马丁
郑云飞
章佳杰
宁小东
刘建辉
于冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Reach Best Technology Co Ltd
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Reach Best Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Reach Best Technology Co Ltd filed Critical Reach Best Technology Co Ltd
Publication of CN110996169A publication Critical patent/CN110996169A/en
Application granted granted Critical
Publication of CN110996169B publication Critical patent/CN110996169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • H04N17/004Diagnosis, testing or measuring for television systems or their details for digital television systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The present disclosure relates to a method, an apparatus, an electronic device and a computer-readable storage medium for clipping a video, the method comprising: extracting a plurality of candidate images from a target video; for each candidate image, carrying out evaluation analysis on the candidate image to obtain an evaluation score of the candidate image; and intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip. So as to optimize the video clipping effect and improve the use experience of the user.

Description

Method, device, electronic equipment and computer-readable storage medium for clipping video
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for editing a video, an electronic device, and a computer-readable storage medium.
Background
In the related art, video editing operation also depends on manual editing to a great extent, the existing manual editing needs to expand video files frame by frame and perform editing with frame as precision, but because the video data volume is very large, if each frame in a video is analyzed, the consumed manual time is very long, and therefore, the manual editing is a very tedious matter.
Meanwhile, if the video clip is to select a highlight segment in the video, the video frame with the highlight content in the video cannot be automatically judged according to the content of the video by adopting a manual clipping mode to generate the video segment. Therefore, the manual clipping mode is not intelligent enough, and when the highlight segment of the video is clipped manually, the accuracy of manually judging the highlight video frame in the video is not high, the final clipping effect is influenced, and the use experience of the user is reduced.
Disclosure of Invention
The present disclosure provides a method, an apparatus, an electronic device and a computer-readable storage medium for editing a video, so as to solve at least the above existing technical problems. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a method of clipping a video, the method comprising:
extracting a plurality of candidate images from a target video;
for each candidate image, carrying out evaluation analysis on the candidate image to obtain an evaluation score of the candidate image;
and intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip.
Optionally, performing an intercepting operation on the target video based on the evaluation scores of the candidate images to obtain a video segment, including:
determining a target image with the highest evaluation score from the plurality of candidate images;
and taking the target image as a center, and respectively intercepting video clips with the same time length before and after the center to obtain a final video clip.
Optionally, performing an intercepting operation on the target video based on the evaluation scores of the candidate images to obtain a video segment, including:
carrying out interpolation processing on the evaluation scores of the candidate images to obtain the evaluation score of each frame of video image in the target video;
setting a sliding window with the length equal to the preset video clip duration;
moving the sliding window in the target video according to a preset step length, and determining the sum of the evaluation scores of all frame video images included in the video clip in each sliding window;
and intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment.
Optionally, for each candidate image, performing evaluation analysis on the candidate image to obtain an evaluation score of the candidate image, including:
for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least one of the following: a definition dimension, a colorfulness dimension, and a significance dimension;
for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the multiple dimensions.
Optionally, for each candidate image, performing evaluation analysis on the candidate image to obtain an evaluation score of the candidate image, including:
for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least two of the following items: a definition dimension, a colorfulness dimension, a significance dimension, at least one dimension associated with a face;
for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the candidate image in the multiple dimensions, wherein the scores of the multiple dimensions comprise at least two of the following items: image clarity score, image colorfulness score, image meaningfulness score, at least one score associated with a human face.
Optionally, the at least one score associated with the face comprises one or more of: the method comprises the following steps of face definition score, eye opening degree score, mouth opening degree score, face composition score and face direction score. According to a second aspect of embodiments of the present disclosure, there is provided an apparatus for clipping a video, the apparatus comprising:
the extraction module is used for extracting a plurality of candidate images from the target video;
the evaluation module is used for carrying out evaluation analysis on each candidate image so as to obtain an evaluation score of the candidate image;
and the intercepting module is used for intercepting the target video based on the respective evaluation scores of the candidate images to obtain a video clip.
Optionally, the intercepting module includes:
a first determination module, configured to determine a target image with a highest evaluation score from the plurality of candidate images;
and the first intercepting submodule is used for respectively intercepting the video clips with the same time length from front to back of the center by taking the target image as the center so as to obtain the final video clip.
Optionally, the intercepting module includes:
the interpolation module is used for carrying out interpolation processing on the evaluation scores of the candidate images to obtain the evaluation scores of each frame of video image in the target video;
the setting module is used for setting a sliding window with the length equal to the preset video clip duration;
the second determining module is used for moving the sliding windows in the target video according to a preset step length and determining the sum of the evaluation scores of all frame video images included in the video clips in each sliding window;
and the second intercepting submodule is used for intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment.
Optionally, the evaluation module comprises:
an analysis module configured to, for each candidate image, analyze the candidate image from a plurality of dimensions, and determine scores of the candidate image in the plurality of dimensions, where the plurality of dimensions include at least one of: a definition dimension, a colorfulness dimension, and a significance dimension;
and the evaluation score determining module is used for determining the evaluation score of each candidate image according to the scores of the candidate image in multiple dimensions and the weight of each dimension in the multiple dimensions.
Optionally, the evaluation module is configured to: for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least two of the following items: a definition dimension, a colorfulness dimension, a significance dimension, at least one dimension associated with a face; for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the candidate image in the multiple dimensions, wherein the scores of the multiple dimensions comprise at least two of the following items: image clarity score, image colorfulness score, image meaningfulness score, at least one score associated with a human face.
Optionally, the at least one score associated with the face comprises one or more of: the method comprises the following steps of face definition score, eye opening degree score, mouth opening degree score, face composition score and face direction score.
According to a third aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the method according to the first aspect of the present application.
According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect of the present application when executed.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
in the embodiment disclosed by the application, the video frames in the video are automatically evaluated, and the video is intercepted based on the evaluation scores to obtain the final highlight video clip. The target video is intercepted based on the evaluation scores of the candidate images, so that the obtained video clips are automatically generated according to the video content, the candidate images are partial video frames extracted from the video, evaluation and analysis on each frame of video images in the video are not needed, the time is saved, the speed of generating the video clips is high, and the quality is high.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a method of clipping a video in accordance with an exemplary embodiment;
FIG. 2 is a flow chart of a method for clipping video according to another embodiment of the present application;
FIG. 3 is a flow chart of a method for clipping video according to another embodiment of the present application;
FIG. 4 is a flow chart of a method for clipping video according to another embodiment of the present application;
fig. 5 is a schematic diagram of an apparatus for clipping a video according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In the related art, video editing operation also depends on manual editing to a great extent, the existing manual editing needs to expand video files frame by frame and perform editing with frame as precision, but because the video data volume is very large, if each frame in a video is analyzed, the consumed manual time is very long, and therefore, the manual editing is a very tedious matter.
Meanwhile, if the video clip is to select a highlight segment in the video, the video frame with the highlight content in the video cannot be automatically judged according to the content of the video by adopting a manual clipping mode to generate the video segment. Therefore, the manual clipping mode is not intelligent enough, and when the highlight segment of the video is clipped manually, the accuracy of manually judging the highlight video frame in the video is not high, the final clipping effect is influenced, and the use experience of the user is reduced. In order to improve the use experience of a user, the video editing method is provided, video frames in the video are automatically evaluated, and the video is intercepted based on the evaluation scores to obtain a final highlight video segment.
Fig. 1 shows a flowchart of a method for editing a video according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
step S11: extracting a plurality of candidate images from a target video;
step S12: for each candidate image, carrying out evaluation analysis on the candidate image to obtain an evaluation score of the candidate image;
step S13: and intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip.
In the present embodiment, the candidate image is a plurality of frames of video images extracted from the target video, and may be extracted at equal inter-frame intervals, for example, one frame of video image is extracted for a single target video, for example, every 10 frames of video images. Taking the example that the target video includes 90 frames of video images, the extracted frames of video images are respectively: 11 th, 22 th, 33 th, 44 th, 55 th, 66 th, 77 th and 88 th frames.
Or, in this embodiment, when extracting multiple frames of video images from the target video, the target video may be firstly divided into multiple sub-segments, and then one frame of video image may be extracted from each sub-segment. Illustratively, for a single target video, the target video is equally divided into N sub-segments, and for each sub-segment, a frame of video image is randomly extracted from the sub-segments.
In the above embodiment, by extracting multiple frames of video images at equal intervals between frames, or by dividing a video into multiple sub-segments and then extracting one frame of video image from each sub-segment, the extracted multiple frames of video images are video images uniformly distributed in a target video, and the multiple frames of video images can more accurately represent the content of the target video, thereby further improving the quality and accuracy of a video clip.
In this embodiment, for each candidate image, the candidate image is evaluated and analyzed to obtain an evaluation score of the candidate image, and then the target video is intercepted based on the respective evaluation scores of the plurality of candidate images to obtain a video clip; wherein the evaluation analysis refers to evaluation analysis of image quality of the candidate image, for example, the image quality includes but is not limited to: the definition of the image, the color richness of the image, the significance of the image, the definition of the face in the image and the like; by evaluating and analyzing the quality of the candidate images, a plurality of images with evaluation scores higher than a preset threshold value can be selected to form a final video clip, the final video clip can be intercepted according to a single image with the highest evaluation score, or a group of continuous images with the highest evaluation score can be intercepted as the final video clip.
In the above embodiment, the target video is intercepted based on the evaluation scores of the candidate images, so that the obtained video segment is automatically generated according to the video content, and the candidate images are partial video frames extracted from the video, and evaluation and analysis of each frame of video image in the video are not needed, so that time is saved, the speed of generating the video segment is high, and the quality is high.
Referring to fig. 2, fig. 2 is a flowchart of a method for clipping a video according to another embodiment of the present application. As shown in fig. 2, in addition to the steps S11-S12, the step S13 specifically includes the following steps:
step S131: determining a target image with the highest evaluation score from the plurality of candidate images;
step S132: and taking the target image as a center, and respectively intercepting video clips with the same time length before and after the center to obtain a final video clip.
In the embodiment, a target image with the highest evaluation score is determined from a plurality of candidate images, the sum of the time lengths of the final video clips (which can be: the preset time length of the video clip is obtained) is determined, and then the video clips with the same time length are respectively intercepted front and back at the center by taking the target image as the center to obtain the final video clip; illustratively, the example illustrated when equal inter-frame space extraction is employed in step S11 above is still employed: taking the example that the target video includes 90 frames of video images, the extracted frames of video images are respectively: 11 th, 22 th, 33 th, 44 th, 55 th, 66 th, 77 th and 88 th frames. If the target image with the highest evaluation score is the 55 th frame video image, and the frame video images of 30 frames are included in the preset time of the video clip required to be generated at this time, the 55 th frame video image is taken as the center, the 15 th frame video image is moved forward and the 15 th frame video image is moved backward, so that the 40 th frame video image-70 th frame video image in the target video is captured, and the video images of the 40 th frame video image-70 th frame video image are taken as the final video clip.
Referring to fig. 3, fig. 3 is a flowchart of a method for clipping a video according to another embodiment of the present application. As shown in fig. 3, in addition to the steps S11-S12, the step S13 specifically includes the following steps:
step S133: carrying out interpolation processing on the evaluation scores of the candidate images to obtain the evaluation score of each frame of video image in the target video;
step S134: setting a sliding window with the length equal to the preset video clip duration;
step S135: moving the sliding window in the target video according to a preset step length, and determining the sum of the evaluation scores of all frame video images included in the video clip in each sliding window;
step S136: and intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment.
In this embodiment, first, interpolation processing is performed on the evaluation scores of each of the plurality of candidate images, the interpolation mode adopted is the prior art, and is not repeated here, and the evaluation scores of each frame of video image in the target video are obtained by adopting the interpolation processing.
Then, setting a sliding window with the length equal to the preset video segment time length to enable the window size of the sliding window to be equal to the preset video segment time length, moving the sliding window in the target video according to a preset step length, and determining the evaluation score sum of all frame video images included in the video segment in each sliding window; for example, the preset step size of the sliding window may be equal to or different from the preset duration of the video segment, for example: the target video comprises 90 frames of video images, the preset duration of the video clip comprises 10 frames of video images, the window size of the sliding window comprises 10 frames of video images, and the preset step length of the sliding window can be 10 frames of video images, also can be 8 frames of video images, and also can be 12 frames of video images; for example: when the length of the sliding window is preset according to 10 frames of video images, the sliding windows generated when the sliding window moves in the target video are as follows: 1 st frame video image-10 th frame video image; 10 th frame video image-20 th frame video image; 20 th frame video image-30 th frame video image; frame 30 video image-frame 40 video image; frame 40 video image-frame 50 video image; frame 50 video image-frame 60 video image; frame 60 video image-frame 70 video image; frame 70 video image-frame 80 video image; frame 80 video image-frame 90 video image.
And finally, intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment. For example, taking the above example as an example, if the sum of the evaluation scores of the 1 st frame video image to the 10 th frame video image is the highest, the 1 st frame video image to the 10 th frame video image are cut into the final video segment.
Referring to fig. 4, fig. 4 is a flowchart of a method for clipping a video according to another embodiment of the present application. As shown in fig. 4, the method includes, in addition to steps S11 and S13, step S12 includes the following steps:
s121: for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least one of the following: a definition dimension, a colorfulness dimension, and a significance dimension;
s122: for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the multiple dimensions.
For each candidate image, the candidate image may be analyzed directly from multiple dimensions to determine scores for the candidate image in the multiple dimensions. For each candidate video, at least one frame of video frame may be extracted from the candidate video, and then the extracted at least one frame of video frame may be analyzed for the video frame from multiple dimensions, so as to determine scores of the candidate video in the multiple dimensions.
For example, at least two of the following sub-scores of the candidate image may be determined for the candidate image, and the score of the candidate image may be determined according to the determined sub-scores and the weights of the sub-scores.
Sub-score 1: the image clarity score. The image sharpness score may be determined by detecting edges of the image, for example using the laplacian, and then calculating the variance of the edge detection results. The larger the variance is, the more details in the image are represented, the clearer the image is, and the higher the corresponding image definition degree score is.
Subpartition 2: the image color richness score. The image colorfulness score may be determined by calculating the respective variances and means of the UV components in the YUV color space, then calculating the square root of the sum of the UV variances A and the square root of the sum of the UV means squares B, and then weighting and summing A and B.
Subpartition 3: the image meaningful degree score. In general, if an image is too simple (e.g., can be easily predicted by coding) or too complex (e.g., very difficult to predict), then the image has a high probability of being meaningless. For example, photographed white walls, floors, etc., are often meaningless simplistic images. For example, a grass, or the like is usually captured as a meaningless, excessively complex image. Therefore, a three-dimensional feature vector can be formed by calculating the variance and the average value of the distortion in the image frame and adding the color richness, and the image meaningful degree score can be determined by utilizing the existing machine learning model for classification.
Subpartition 4: a face clarity score. The edges can be detected according to the face area, and then the variance of the edge detection result is calculated, so that the image definition degree score is determined. Or the face area is subjected to fuzzy processing and then is subjected to difference comparison with the original face area, if the difference is larger, the original face is clearer, and therefore the image definition degree score is determined. The face region described here may be a frame-shaped region for face detection, a face contour region framed based on features of the face, or a face internal region formed by two eyes and a chin (or mouth).
Subpartition 5: the eye opening degree score. The eye-open degree score may be determined by calculating the ratio of the distance between the upper and lower eyelids and the distance between the inner and outer canthus of the eyes, R1 and R2, respectively, for both eyes. Wherein if R1 is close to R2, the eye opening degree of both eyes is considered to be consistent, and the sum of R1 and R2 is determined as the eye opening degree score; otherwise, 2 × max (R1, R2) is determined as the eye-open degree score, considering that an expression of opening one eye and closing one eye is possible. Furthermore, if both R1 and R2 are small, i.e., both eyes are closed, the eye-open score may be penalized, e.g., by taking a zero or negative value.
Sub-clause 6: mouth opening degree score the mouth opening degree score can be determined by calculating the angle ∠ BAC and angle ∠ ABC between the line connecting the two mouth angles A, B and the midpoint C of the lower lip, wherein the larger the angle, the higher the mouth opening degree score.
Subpartition 7: a face composition score. The face composition score can be determined by calculating the center of gravity of a polygon formed by connecting the center points of the faces and comparing the distance between the center of gravity and the ideal composition center of gravity (such as the center of a portrait image is on the top, or the left side of a landscape image is on the top, or the right side of the landscape image is on the top, etc.). Wherein the closer the distance, the higher the face composition score.
Subpartition 8: a face direction score. The face direction score is determined by calculating the direction of the face (e.g., head up/down, left/right turn, left/right head skew). For example, the head lowering angle is in a proper range, and the face direction score is large; the left/right turning direction is in a certain range, and the score of the face direction is larger; the head-bending direction is within a certain range, and the score of the face direction is larger.
It should be understood that in the case where a face is included in the candidate image, the above-described sub-clauses 4 to 8 may be determined for the candidate image.
In addition, for the candidate video, the score of the candidate video may also be adjusted according to the stability of the video, for example, whether the video is jittered, whether the scene is frequently switched, and the like.
Based on the same inventive concept, an embodiment of the present application provides an apparatus for editing video. Referring to fig. 5, fig. 5 is a schematic diagram of an apparatus for clipping a video according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:
an extracting module 501, configured to extract a plurality of candidate images from a target video;
an evaluation module 502, configured to perform evaluation analysis on each candidate image to obtain an evaluation score of the candidate image;
an intercepting module 503, configured to perform an intercepting operation on the target video based on the evaluation scores of the multiple candidate images, so as to obtain a video segment.
Optionally, the intercepting module includes:
a first determination module, configured to determine a target image with a highest evaluation score from the plurality of candidate images;
and the first intercepting submodule is used for respectively intercepting the video clips with the same time length from front to back of the center by taking the target image as the center so as to obtain the final video clip.
Optionally, the intercepting module includes:
the interpolation module is used for carrying out interpolation processing on the evaluation scores of the candidate images to obtain the evaluation scores of each frame of video image in the target video;
the setting module is used for setting a sliding window with the length equal to the preset video clip duration;
the second determining module is used for moving the sliding windows in the target video according to a preset step length and determining the sum of the evaluation scores of all frame video images included in the video clips in each sliding window;
and the second intercepting submodule is used for intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment.
Optionally, the evaluation module comprises:
an analysis module configured to, for each candidate image, analyze the candidate image from a plurality of dimensions, and determine scores of the candidate image in the plurality of dimensions, where the plurality of dimensions include at least one of: a definition dimension, a colorfulness dimension, and a significance dimension;
and the evaluation score determining module is used for determining the evaluation score of each candidate image according to the scores of the candidate image in multiple dimensions and the weight of each dimension in the multiple dimensions.
Optionally, the evaluation module is configured to: for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least two of the following items: a definition dimension, a colorfulness dimension, a significance dimension, at least one dimension associated with a face; for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the candidate image in the multiple dimensions, wherein the scores of the multiple dimensions comprise at least two of the following items: image clarity score, image colorfulness score, image meaningfulness score, at least one score associated with a human face.
Optionally, the at least one score associated with the face comprises one or more of: the method comprises the following steps of face definition score, eye opening degree score, mouth opening degree score, face composition score and face direction score.
Based on the same inventive concept, another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the method according to any of the above-mentioned embodiments of the present application.
Based on the same inventive concept, another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the electronic device implements the steps of the method according to any of the above embodiments of the present application.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method, the apparatus, the electronic device, and the computer-readable storage medium for clipping video provided by the present application are introduced in detail above, and a specific example is applied in the present application to explain the principles and embodiments of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method of clipping a video, the method comprising:
extracting a plurality of candidate images from a target video;
for each candidate image, carrying out evaluation analysis on the candidate image to obtain an evaluation score of the candidate image;
and intercepting the target video based on the evaluation scores of the candidate images to obtain a video clip.
2. The method of claim 1, wherein performing a clipping operation on the target video based on the evaluation scores of the candidate images to obtain a video clip comprises:
determining a target image with the highest evaluation score from the plurality of candidate images;
and taking the target image as a center, and respectively intercepting video clips with the same time length before and after the center to obtain a final video clip.
3. The method of claim 1, wherein performing a clipping operation on the target video based on the evaluation scores of the candidate images to obtain a video clip comprises:
carrying out interpolation processing on the evaluation scores of the candidate images to obtain the evaluation score of each frame of video image in the target video;
setting a sliding window with the length equal to the preset video clip duration;
moving the sliding window in the target video according to a preset step length, and determining the sum of the evaluation scores of all frame video images included in the video clip in each sliding window;
and intercepting the video segment in the sliding window with the highest sum of the evaluation scores to obtain a final video segment.
4. The method of claim 1, wherein for each candidate image, performing an evaluation analysis on the candidate image to obtain an evaluation score of the candidate image comprises:
for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least one of the following: a definition dimension, a colorfulness dimension, and a significance dimension;
for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the multiple dimensions.
5. The method of claim 1, wherein for each candidate image, performing an evaluation analysis on the candidate image to obtain an evaluation score of the candidate image comprises:
for each candidate image, analyzing the candidate image from multiple dimensions, and determining scores of the candidate image in the multiple dimensions, wherein the multiple dimensions comprise at least two of the following items: a definition dimension, a colorfulness dimension, a significance dimension, at least one dimension associated with a face;
for each candidate image, determining an evaluation score of the candidate image according to scores of the candidate image in multiple dimensions and weights of the candidate image in the multiple dimensions, wherein the scores of the multiple dimensions comprise at least two of the following items: image clarity score, image colorfulness score, image meaningfulness score, at least one score associated with a human face.
6. The method of claim 5, wherein the at least one face-related score comprises one or more of: the method comprises the following steps of face definition score, eye opening degree score, mouth opening degree score, face composition score and face direction score.
7. An apparatus for clipping video, the apparatus comprising:
the extraction module is used for extracting a plurality of candidate images from the target video;
the evaluation module is used for carrying out evaluation analysis on each candidate image so as to obtain an evaluation score of the candidate image;
and the intercepting module is used for intercepting the target video based on the respective evaluation scores of the candidate images to obtain a video clip.
8. The apparatus of claim 7, wherein the intercept module comprises:
a first determination module, configured to determine a target image with a highest evaluation score from the plurality of candidate images;
and the first intercepting submodule is used for respectively intercepting the video clips with the same time length from front to back of the center by taking the target image as the center so as to obtain the final video clip.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executed implements the steps of the method according to any of claims 1-6.
CN201911115501.3A 2019-07-12 2019-11-14 Method, device, electronic equipment and computer-readable storage medium for clipping video Active CN110996169B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2019106320311 2019-07-12
CN201910632031 2019-07-12

Publications (2)

Publication Number Publication Date
CN110996169A true CN110996169A (en) 2020-04-10
CN110996169B CN110996169B (en) 2022-03-01

Family

ID=70084601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911115501.3A Active CN110996169B (en) 2019-07-12 2019-11-14 Method, device, electronic equipment and computer-readable storage medium for clipping video

Country Status (1)

Country Link
CN (1) CN110996169B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111541943A (en) * 2020-06-19 2020-08-14 腾讯科技(深圳)有限公司 Video processing method, video operation method, device, storage medium and equipment
CN111818363A (en) * 2020-07-10 2020-10-23 携程计算机技术(上海)有限公司 Short video extraction method, system, device and storage medium
CN111814840A (en) * 2020-06-17 2020-10-23 恒睿(重庆)人工智能技术研究院有限公司 Method, system, equipment and medium for evaluating quality of face image
CN111918122A (en) * 2020-07-28 2020-11-10 北京大米科技有限公司 Video processing method and device, electronic equipment and readable storage medium
CN112532897A (en) * 2020-11-25 2021-03-19 腾讯科技(深圳)有限公司 Video clipping method, device, equipment and computer readable storage medium
CN112770061A (en) * 2020-12-16 2021-05-07 影石创新科技股份有限公司 Video editing method, system, electronic device and storage medium
CN113709560A (en) * 2021-03-31 2021-11-26 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium
US20220067383A1 (en) * 2020-08-25 2022-03-03 Beijing Xiaomi Pinecone Electronics Co., Ltd. Method and apparatus for video clip extraction, and storage medium
CN114286081A (en) * 2021-12-22 2022-04-05 携程旅游信息技术(上海)有限公司 Video quality evaluation method, apparatus, and medium
WO2022262766A1 (en) * 2021-06-18 2022-12-22 影石创新科技股份有限公司 Automatic clipping method and device, camera, and computer readable storage medium
CN115734007A (en) * 2022-09-22 2023-03-03 北京国际云转播科技有限公司 Video editing method, device, medium and video processing system

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807198A (en) * 2010-01-08 2010-08-18 中国科学院软件研究所 Video abstraction generating method based on sketch
CN104598921A (en) * 2014-12-31 2015-05-06 乐视网信息技术(北京)股份有限公司 Video preview selecting method and device
CN105357594A (en) * 2015-11-19 2016-02-24 南京云创大数据科技股份有限公司 Massive video abstraction generation method based on cluster and H264 video concentration algorithm
CN105979266A (en) * 2016-05-06 2016-09-28 西安电子科技大学 Interframe relevance and time slot worst based time domain information fusion method
US20170094293A1 (en) * 2015-09-30 2017-03-30 Apple Inc. Configurable motion estimation search systems and methods
US20180121733A1 (en) * 2016-10-27 2018-05-03 Microsoft Technology Licensing, Llc Reducing computational overhead via predictions of subjective quality of automated image sequence processing
US10074015B1 (en) * 2015-04-13 2018-09-11 Google Llc Methods, systems, and media for generating a summarized video with video thumbnails
CN108632641A (en) * 2018-05-04 2018-10-09 百度在线网络技术(北京)有限公司 Method for processing video frequency and device
CN108632668A (en) * 2018-05-04 2018-10-09 百度在线网络技术(北京)有限公司 Method for processing video frequency and device
CN108764060A (en) * 2018-05-07 2018-11-06 中国传媒大学 Video lens edge detection method based on sliding window
CN108804578A (en) * 2018-05-24 2018-11-13 南京理工大学 The unsupervised video summarization method generated based on consistency segment
CN108833942A (en) * 2018-06-28 2018-11-16 北京达佳互联信息技术有限公司 Video cover choosing method, device, computer equipment and storage medium
CN108830208A (en) * 2018-06-08 2018-11-16 Oppo广东移动通信有限公司 Method for processing video frequency and device, electronic equipment, computer readable storage medium
CN109544503A (en) * 2018-10-15 2019-03-29 北京达佳互联信息技术有限公司 Image processing method, device, electronic equipment and storage medium
CN109685785A (en) * 2018-12-20 2019-04-26 上海众源网络有限公司 A kind of image quality measure method, apparatus and electronic equipment
US20190180109A1 (en) * 2017-12-12 2019-06-13 Microsoft Technology Licensing, Llc Deep learning on image frames to generate a summary

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807198A (en) * 2010-01-08 2010-08-18 中国科学院软件研究所 Video abstraction generating method based on sketch
CN104598921A (en) * 2014-12-31 2015-05-06 乐视网信息技术(北京)股份有限公司 Video preview selecting method and device
US10074015B1 (en) * 2015-04-13 2018-09-11 Google Llc Methods, systems, and media for generating a summarized video with video thumbnails
US20170094293A1 (en) * 2015-09-30 2017-03-30 Apple Inc. Configurable motion estimation search systems and methods
CN105357594A (en) * 2015-11-19 2016-02-24 南京云创大数据科技股份有限公司 Massive video abstraction generation method based on cluster and H264 video concentration algorithm
CN105979266A (en) * 2016-05-06 2016-09-28 西安电子科技大学 Interframe relevance and time slot worst based time domain information fusion method
US20180121733A1 (en) * 2016-10-27 2018-05-03 Microsoft Technology Licensing, Llc Reducing computational overhead via predictions of subjective quality of automated image sequence processing
US20190180109A1 (en) * 2017-12-12 2019-06-13 Microsoft Technology Licensing, Llc Deep learning on image frames to generate a summary
CN108632668A (en) * 2018-05-04 2018-10-09 百度在线网络技术(北京)有限公司 Method for processing video frequency and device
CN108632641A (en) * 2018-05-04 2018-10-09 百度在线网络技术(北京)有限公司 Method for processing video frequency and device
CN108764060A (en) * 2018-05-07 2018-11-06 中国传媒大学 Video lens edge detection method based on sliding window
CN108804578A (en) * 2018-05-24 2018-11-13 南京理工大学 The unsupervised video summarization method generated based on consistency segment
CN108830208A (en) * 2018-06-08 2018-11-16 Oppo广东移动通信有限公司 Method for processing video frequency and device, electronic equipment, computer readable storage medium
CN108833942A (en) * 2018-06-28 2018-11-16 北京达佳互联信息技术有限公司 Video cover choosing method, device, computer equipment and storage medium
CN109544503A (en) * 2018-10-15 2019-03-29 北京达佳互联信息技术有限公司 Image processing method, device, electronic equipment and storage medium
CN109685785A (en) * 2018-12-20 2019-04-26 上海众源网络有限公司 A kind of image quality measure method, apparatus and electronic equipment

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814840A (en) * 2020-06-17 2020-10-23 恒睿(重庆)人工智能技术研究院有限公司 Method, system, equipment and medium for evaluating quality of face image
CN111541943A (en) * 2020-06-19 2020-08-14 腾讯科技(深圳)有限公司 Video processing method, video operation method, device, storage medium and equipment
CN111541943B (en) * 2020-06-19 2020-10-16 腾讯科技(深圳)有限公司 Video processing method, video operation method, device, storage medium and equipment
CN111818363A (en) * 2020-07-10 2020-10-23 携程计算机技术(上海)有限公司 Short video extraction method, system, device and storage medium
CN111918122A (en) * 2020-07-28 2020-11-10 北京大米科技有限公司 Video processing method and device, electronic equipment and readable storage medium
US20220067383A1 (en) * 2020-08-25 2022-03-03 Beijing Xiaomi Pinecone Electronics Co., Ltd. Method and apparatus for video clip extraction, and storage medium
JP2022037876A (en) * 2020-08-25 2022-03-09 ペキン シャオミ パインコーン エレクトロニクス カンパニー, リミテッド Video clip extraction method, video clip extraction device, and storage medium
JP7292325B2 (en) 2020-08-25 2023-06-16 ペキン シャオミ パインコーン エレクトロニクス カンパニー, リミテッド Video clip extraction method, video clip extraction device and storage medium
US11900682B2 (en) * 2020-08-25 2024-02-13 Beijing Xiaomi Pinecone Electronics Co., Ltd. Method and apparatus for video clip extraction, and storage medium
CN112532897A (en) * 2020-11-25 2021-03-19 腾讯科技(深圳)有限公司 Video clipping method, device, equipment and computer readable storage medium
CN112770061A (en) * 2020-12-16 2021-05-07 影石创新科技股份有限公司 Video editing method, system, electronic device and storage medium
CN113709560A (en) * 2021-03-31 2021-11-26 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium
CN113709560B (en) * 2021-03-31 2024-01-02 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium
WO2022262766A1 (en) * 2021-06-18 2022-12-22 影石创新科技股份有限公司 Automatic clipping method and device, camera, and computer readable storage medium
CN114286081A (en) * 2021-12-22 2022-04-05 携程旅游信息技术(上海)有限公司 Video quality evaluation method, apparatus, and medium
CN115734007A (en) * 2022-09-22 2023-03-03 北京国际云转播科技有限公司 Video editing method, device, medium and video processing system
CN115734007B (en) * 2022-09-22 2023-09-01 北京国际云转播科技有限公司 Video editing method, device, medium and video processing system

Also Published As

Publication number Publication date
CN110996169B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN110996169B (en) Method, device, electronic equipment and computer-readable storage medium for clipping video
CN106920229B (en) Automatic detection method and system for image fuzzy area
KR102275452B1 (en) Method for tracking image in real time considering both color and shape at the same time and apparatus therefor
JP6655878B2 (en) Image recognition method and apparatus, program
CN105404884B (en) Image analysis method
EP3651055A1 (en) Gesture recognition method, apparatus, and device
WO2020107716A1 (en) Target image segmentation method and apparatus, and device
US10474903B2 (en) Video segmentation using predictive models trained to provide aesthetic scores
US9042662B2 (en) Method and system for segmenting an image
US20120321134A1 (en) Face tracking method and device
CN105426828B (en) Method for detecting human face, apparatus and system
JP2017531883A (en) Method and system for extracting main subject of image
JP2003030667A (en) Method for automatically locating eyes in image
US20150248592A1 (en) Method and device for identifying target object in image
CN110730381A (en) Method, device, terminal and storage medium for synthesizing video based on video template
JP2005284348A (en) Information processor and information processing method, recording medium, and program
JP2002208014A (en) Multi-mode digital image processing method for detecting eye
CN110996183B (en) Video abstract generation method, device, terminal and storage medium
KR102434397B1 (en) Real time multi-object tracking device and method by using global motion
CN111126300B (en) Human body image detection method and device, electronic equipment and readable storage medium
CN113963149A (en) Medical bill picture fuzzy judgment method, system, equipment and medium
KR20140134549A (en) Apparatus and Method for extracting peak image in continuously photographed image
EP2887261A2 (en) Information processing device, information processing method, and program
JP2021111228A (en) Learning device, learning method, and program
JP6467817B2 (en) Image processing apparatus, image processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant