CN114666656A

CN114666656A - Video clipping method, video clipping device, electronic equipment and computer readable medium

Info

Publication number: CN114666656A
Application number: CN202210255943.3A
Authority: CN
Inventors: 周芳汝; 杨玫
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-06-24
Anticipated expiration: 2042-03-15
Also published as: CN114666656B

Abstract

The embodiment of the disclosure provides a video clipping method, a video clipping device, electronic equipment and a computer readable medium, wherein the method comprises the following steps: obtaining a video duration T₁And determining the target video duration T₂(ii) a Carrying out feature extraction on the image frame of the video to be edited to obtain the image features of the image frame; according to the video duration T of the video to be edited₁The target video duration T₂And image features of the image frames determine image frames to be deleted in the image frames; deleting the image frames to be deleted in the video to be clipped so as to clip the video to be clipped. The video clipping method, the video clipping device, the electronic equipment and the computer readable medium provided by the embodiment of the disclosure can realize automatic and accurate clipping of videos and reduce the consumption of labor and time cost.

Description

Video clipping method, video clipping device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video editing method and apparatus, an electronic device, and a computer-readable medium.

Background

With the arrival of the information age, video users grow rapidly, videos on various platforms grow explosively, and video clips become more important. However, for a large amount of shot material, manual video clipping requires a lot of manpower and time.

Therefore, a new video clipping method, apparatus, electronic device, and computer readable medium are needed.

The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a video clipping method, an apparatus, an electronic device, and a computer-readable medium, which can implement automatic clipping of a video and reduce labor and time cost consumption.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the embodiments of the present disclosure, a video clipping method is provided, which includes: obtaining a video duration T₁And determining the target video duration T₂(ii) a Carrying out feature extraction on the image frame of the video to be edited to obtain the image features of the image frame; according to the video duration T of the video to be edited₁The target video duration T₂And image features of the image frames determine image frames to be deleted in the image frames; deleting the image frames to be deleted in the video to be clipped so as to clip the video to be clipped.

In an exemplary embodiment of the present disclosure, performing feature extraction on image frames of the video to be clipped, and obtaining image features of the image frames includes: processing the t-s image frame to the t + s image frame in the video to be edited through an encoder to obtain the image characteristics of the t image frame in the video to be edited; wherein t is more than 0 and less than N, N is the total image frame number of the video to be clipped, and s is more than 0.

In an exemplary embodiment of the present disclosure, the method further comprises the step of determining a position of the target object in accordance with the positionVideo duration T of clip video₁The target video duration T₂And the image characteristics of the image frames determining the image frames to be deleted in the image frames comprises: obtaining the importance score s of the t image frame according to the image characteristics from the t-m image frame to the t + m image frame in the video to be edited_tM is more than 0; according to the video time length T₁And the target video duration T₂Determining the number n of frames to be deleted according to the difference value and the frame rate of the video to be clipped; dividing the image frame of the video to be edited into n intervals; and determining the image frame with the minimum importance score in each interval as the image frame to be deleted.

In an exemplary embodiment of the disclosure, the importance score s of the t-th image frame is obtained according to the image characteristics of the t-m-th to t + m-th image frames in the video to be clipped_tThe method comprises the following steps:

wherein, F_t+iFor image features of the t + i-th image frame, w_iIs the weight of the ith image frame.

In an exemplary embodiment of the present disclosure, the method further comprises:

in an exemplary embodiment of the disclosure, the video duration T of the video to be clipped is determined according to the video duration T₁The target video duration T₂And the image characteristics of the image frames determining the image frames to be deleted in the image frames comprises: dividing the video to be clipped into I video segments, wherein I is an integer greater than 1; determining an importance score of each image frame in the ith video segment; determining the importance score of the ith video segment according to the average value of the importance scores of all the image frames in the ith video segment; sorting the video clips in a descending order according to the importance scores of the video clips to obtain a sorting result; according to the video time length T₁The target video duration T₂With the duration of each video segment, willDetermining the first q video clips in the sequencing result as reserved clips; according to the video time lengths of the first q video segments in the sequencing result and the target video time length T₂Determining a first image frame in the first q video segments in the sequencing result according to the image characteristics of the image frames in the first q video segments in the sequencing result; and determining a first image frame in the first q video segments in the sequencing result and image frames in the (q + 1) th to the I-th video segments in the sequencing result as the image frame to be deleted.

In an exemplary embodiment of the present disclosure, the video duration T is determined according to the video duration T₁The target video time length T₂Determining the first q video segments in the sequencing result as reserved segments according to the duration of each video segment comprises:

and is

Wherein, N ═ T₂Xfps, fps being the frame rate of the video to be clipped,

the number of the image frames of the j-th video segment in the sequencing result.

In an exemplary embodiment of the present disclosure, acquiring a video to be clipped includes: acquiring K video segments to be edited, wherein K is an integer larger than 2, the K video segments to be edited comprise L sequenced segments and K-L segments to be sequenced, and L is larger than or equal to 1 and smaller than or equal to K-2; determining the segment characteristics of each video segment to be clipped according to the image characteristics of the image frame in each video segment to be clipped; determining the relevance scores of the L-th ordered segment and the K-L segments to be ordered according to the distance between the segment characteristics of the L-th ordered segment and the segment characteristics of the K-L segments to be ordered; determining the segment to be sorted with the highest correlation score with the L-th sorted segment in the K-L segments to be sorted as the L + 1-th sorted segment; adding one to L and returning to execute the steps until K-1 is reached to obtain K sequenced segments; and synthesizing the K sequenced segments according to the sequence of the K sequenced segments to obtain the video to be edited.

According to a second aspect of embodiments of the present disclosure, there is provided a video clipping device, the device comprising: a video acquisition module for acquiring the video duration T₁And determining the target video duration T₂(ii) a The characteristic extraction module is used for extracting the characteristics of the image frames of the video to be edited to obtain the image characteristics of the image frames; an image frame positioning module for positioning the image frame according to the video duration T of the video to be clipped₁The target video duration T₂And image features of the image frames determine image frames to be deleted in the image frames; and the video clipping module is used for deleting the image frames to be deleted in the video to be clipped so as to clip the video to be clipped.

According to a third aspect of the embodiments of the present disclosure, an electronic device is provided, which includes: one or more processors; storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement the video clip method of any of the above.

According to a fourth aspect of embodiments of the present disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when executed by a processor, implements a video clipping method as defined in any of the above.

According to the video clipping method, the video clipping device, the electronic equipment and the computer readable medium provided by some embodiments of the present disclosure, feature extraction is performed on image frames of a video to be clipped to obtain image features of the image frames, and importance scores of different image frames or video segments can be considered based on the image features of the image frames; and the importance is considered based on the image characteristics of the image frames and the video duration T of the video to be edited₁The target video time length T₂Determining a certain number of image frames to be deleted in the image frames; deleting the image frames to be deleted in the video to be edited so as to delete the image frames to be deletedThe video to be edited is edited, so that the automatic and accurate editing of the video can be realized, and the consumption of labor and time cost is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.

FIG. 1 is a system block diagram illustrating a video clipping method and apparatus according to an example embodiment.

FIG. 2 is a flow diagram illustrating a method of video clipping in accordance with an exemplary embodiment.

FIG. 3 is a flowchart illustrating a method of video clipping in accordance with another exemplary embodiment.

Fig. 4 is a flowchart illustrating a video clipping method according to yet another exemplary embodiment.

FIG. 5 is a flowchart illustrating a video clipping method according to yet another exemplary embodiment.

Fig. 6(a) is a schematic diagram illustrating importance scores of image frames according to an exemplary embodiment.

FIG. 6(b) is a schematic diagram illustrating importance scores for video segments according to an example embodiment.

Fig. 7 is a flowchart illustrating training of an unsupervised network in accordance with an exemplary embodiment.

FIG. 8 is a flowchart illustrating a video clipping method according to yet another exemplary embodiment.

FIG. 9 is a block diagram illustrating a video clipping device according to an example embodiment.

Fig. 10 schematically illustrates a block diagram of an electronic device in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.

The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The drawings are merely schematic illustrations of the present invention, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and steps, nor do they necessarily have to be performed in the order described. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The following detailed description of exemplary embodiments of the invention refers to the accompanying drawings.

In the system 100 of video clipping methods and apparatus, the server 105 may be a server providing various services, such as a background management server (for example only) providing support over the network 104 for a video clipping system operated by users with

terminal devices

101, 102, 103. The backend management server may analyze and otherwise process data such as the received video clip request, and feed back a processing result (e.g., a clipped video — just an example) to the terminal device.

The server 105 may be a server of one entity, and may also be composed of a plurality of servers, for example, a part of the server 105 may be used as a video clip task submitting system in the present disclosure, for example, to obtain a task to execute a video clip command; and a portion of the server 105 may also be used, for example, as a video clipping system in this disclosure, for obtaining a video having a video duration T₁And determining the target video duration T₂(ii) a Carrying out feature extraction on the image frame of the video to be edited to obtain the image features of the image frame; according to the video duration T of the video to be edited₁The target video duration T₂And image features of the image frames determine image frames to be deleted in the image frames; deleting the image frames to be deleted in the video to be clipped so as to clip the video to be clipped.

FIG. 2 is a flowchart illustrating a method of video clipping in accordance with an exemplary embodiment. The video clipping method provided by the embodiments of the present disclosure may be executed by any electronic device with computing processing capability, such as the

terminal devices

101, 102, and 103 and/or the server 105, and in the following embodiments, the server executes the method as an example for illustration, but the present disclosure is not limited thereto. The video clipping method provided by the embodiment of the present disclosure may include steps S202 to S208.

As shown in fig. 2, in step S202, the video having a video duration T is acquired₁And determining the target video duration T₂。

In the embodiment of the present disclosure, for example, a video clip request may be received, where the video clip request includes a video to be clipped and a target video duration T₂. Target video duration T₂Is the desired duration of the video after the video to be clipped is clipped. Wherein, T₁＞T₂. In particular, the video clip request may include a video segment to be clipped; after the video clip request is obtained, the video to be clipped may be synthesized from the video segments to be clipped, which may specifically refer to the embodiment shown in fig. 5.

In step S204, feature extraction is performed on the image frames of the video to be clipped, so as to obtain image features of the image frames.

In the embodiment of the disclosure, the image characteristics of each frame of image (i.e. image frame) in the video to be edited can be obtained by adopting the self-supervised learning.

In an exemplary embodiment, the t-s image frame to the t + s image frame in the video to be edited can be processed through an encoder, so as to obtain the image characteristics of the t image frame in the video to be edited; wherein t is more than 0 and less than N, N is the total image frame number of the video to be clipped, and s is more than 0. In the embodiment, for the t-th image frame in the video to be edited, the encoder combines the image information of the image frames of the previous s frame and the next s frame to process, and can fully utilize the related information of the previous and next frames of the t-th image frame to obtain the image characteristics capable of fully representing the t-th image frame.

In step S206, according to the video duration T of the video to be clipped₁The target video duration T₂And determining an image frame to be deleted in the image frame by using the image characteristics of the image frame.

In the embodiment of the disclosure, the importance score of the video segment or each image frame in the video to be clipped can be calculated according to the image characteristics, and then the image frame to be deleted is determined in the image frame based on the importance score of the video segment or each image frame in the video to be clipped. The video segments can be obtained by clustering image frames in the video to be clipped, and dividing the video to be clipped by a differential or gradient method, for example.

In step S208, deleting image frames to be deleted in the video to be clipped to clip the video to be clipped.

Wherein the target video duration T is obtained after the video to be edited is edited₂The target clip video.

According to the video clipping method provided by the embodiment of the disclosure, the image frames of the video to be clipped are subjected to feature extraction to obtain the image features of the image frames, and the importance scores of different image frames or video segments can be considered based on the image features of the image frames; and the importance is considered based on the image characteristics of the image frames and the video duration T of the video to be edited₁The target video duration T₂Determining a certain number of image frames to be deleted in the image frames; and deleting the image frames to be deleted in the video to be edited so as to edit the video to be edited, so that the automatic and accurate editing of the video can be realized, and the consumption of labor and time cost is reduced.

FIG. 3 is a flow chart illustrating a method of video clipping in accordance with another exemplary embodiment. Step S206 of the embodiment of fig. 2 may include steps S302 to S308.

As shown in fig. 3, in step S302, according to the image characteristics of the t-m th image frame to the t + m th image frame in the video to be clipped, the importance score S of the t-th image frame is obtained_t，m＞0。

In the embodiment of the present disclosure, the importance score s of the t-th image frame_tMay be represented by the following formula (1):

wherein, F_t+iFor image features of the t + i-th image frame, w_iIs the weight of the ith image frame. Dist represents a distance function, such as Euclidean distance. w is a_iIndicating the weight, the weight will be a little larger when calculating the feature distance of two frames near the ith image frame, and the weight will be correspondingly smaller when calculating the feature frame number farther from the tth frame, for example, w_iCan be determined by the following formula (2).

Importance score s of the t-th image frame_tIt can be expressed that when the difference between the t-th image frame and the image frame nearby is large, the importance score s thereof_tLarger, otherwise, the t-th image frame is similar to the adjacent image frames, and the importance score s_tAnd is lower.

In step S304, according to the video duration T₁And the target video time length T₂Determining the number n of frames to be deleted according to the difference value of the video to be clipped and the frame rate of the video to be clipped.

In the embodiment of the present disclosure, when the frame rate of the video to be clipped is represented as fps, the number n of frames to be deleted can be calculated as following formula (3):

n＝(T₁-T₂)×fps (3)

in step S306, the image frames of the video to be clipped are divided into n sections.

Wherein, the total frame number N of the image frames of the video to be edited is T₁Xfps, then each of the video to be edited can be determined

Each image frame is a section.

In step S308, the image frame with the smallest importance score in each interval is determined as the image frame to be deleted.

Each interval has an image frame to be deleted, and n intervals can determine n image frames to be deleted. Fig. 6(a) is a schematic diagram illustrating importance scores of image frames according to an exemplary embodiment. As shown in fig. 6(a), the horizontal axis represents 400 image frames of the video to be clipped (i.e., the total number of frames N is 400), and the vertical axis represents the importance score of each image frame. Wherein, the dots in fig. 6(a) represent the image frames to be deleted in the video to be clipped.

Fig. 4 is a flowchart illustrating a video clipping method according to yet another exemplary embodiment. Step S206 of the embodiment of fig. 2 may include steps S402 to S414.

As shown in fig. 4, in step S402, the video to be clipped is divided into I video segments, where I is an integer greater than 1.

In the embodiment of the disclosure, the image frames belonging to the same category may be divided into the same video segment by using a clustering method. For another example, a difference method, a gradient method, etc. may be used to find the segmentation point of the video segment, and the video to be edited is divided into different video segments at the segmentation point.

In step S404, the importance score of each image frame in the ith video segment is determined.

The calculation method of the importance score of each image frame is shown in formula (1).

In step S406, the importance score of the ith video segment is determined according to the average of the importance scores of the image frames in the ith video segment.

For I video clips¹,…,clip^IIn which the ith video clipⁱNumber of frames of n_iThen the importance score for each image frame in the ith video segment may be expressed as

And the importance score S of the ith video clipⁱCan be represented by the following formula (4).

In step S408, the video segments are sorted in descending order according to the importance scores of the video segments, and a sorting result is obtained.

The higher the importance score of the video segment is, the more the picture change of the image frame in the video segment is represented, and the more the highlight of the video segment is. For I video clips¹,…,clip^IAccording to its importance score S¹,…,S^IGet after descending order sorting

The corresponding video segment has an importance score of

And is

The frame number of the image frame of the sorted video segment is

In step S410, according to the video duration T₁The target video duration T₂And determining the first q video clips in the sequencing result as reserved clips according to the duration of each video clip.

Wherein the value of q can be determined by the following formula (5).

And is

Wherein N' ═ T₂Xfps, i.e. the total frame number of the clipped video, fps is the frame rate of the video to be clipped,

the number of the image frames of the j-th video segment in the sequencing result. FIG. 6(b) is a schematic diagram illustrating importance scores for video segments according to an example embodiment. As shown in fig. 6(b), each horizontal line (including a solid line and a dotted line) represents a video segment of a video to be clipped, and the vertical axis represents an importance score of each video segment. In fig. 6(b), the solid line represents the reserved segment in the video to be edited, and the dotted line represents the video segment to be deleted. Keeping fragments absent for deletionA video segment of an image frame. The total number of image frames for a retained segment may be expressed as

In step S412, according to the video durations of the first q video segments in the sorting result, the target video duration T₂And determining the first image frame in the first q video segments in the sequencing result according to the image characteristics of the image frames in the first q video segments in the sequencing result.

Wherein when N is_m>N', the method for determining image frames to be deleted similar to the embodiment shown in fig. 3 may be further adopted to determine the image frames to be deleted of the first q video segments in the sorting result as the first image frame. For example, for the first q video segments of the ordering result, the number of image frames is N_qFirstly, determining the importance score of each image frame in the first q video segments and according to N_m-N' determining the number of frames of image frames to be deleted in the first q video segments, dividing the first q video segments into N_m-N' intervals; will be N_m-of the N' intervals, the image frame with the smallest importance score in each interval is determined as the first image frame of the top q video segments.

In step S414, the first image frame in the first q video segments in the sorting result and the image frames in the q +1 th to I-th video segments in the sorting result are determined as the image frame to be deleted.

FIG. 5 is a flowchart illustrating a method of video clipping in accordance with yet another exemplary embodiment. Step S202 of the embodiment of fig. 2 may include steps S502 to S512.

As shown in fig. 5, in step S502, K video segments to be clipped are obtained, where K is an integer greater than 2, the K video segments to be clipped include L sequenced segments and K-L segments to be sequenced, and L is greater than or equal to 1 and less than or equal to K-2. Wherein the L sorted segments are sorted in a specified order.

Wherein the K video segments to be clipped may for example be included in the video clip request. The K video segments to be clipped can be respectively recorded as: v¹,V²,…,V^K。

In step S504, segment characteristics of each video segment to be clipped are determined according to image characteristics of image frames in each video segment to be clipped.

Wherein, suppose that the ith video segment V to be clippedⁱIs n', its image frame can be recorded as:

wherein

Representing the jth image frame of the ith video segment to be clipped. The image characteristics of the j image frame of the video segment to be edited can be represented as

The section characteristic F of the ith video section to be clipped can be determined according to the following formula (6)ⁱ。

In step S506, relevance scores of the lth sorted segment and the K-L segments to be sorted are determined according to distances between the segment features of the lth sorted segment and the segment features of the K-L segments to be sorted.

In the embodiment of the present disclosure, the distance between the segment feature of the L-th sorted segment and the segment feature of the i-th segment to be sorted may be determined according to the following formula (7), where i is an integer greater than 0 and equal to or less than K-L.

S_Li＝-Dist(F^L,Fⁱ) (7)

Wherein, F^LFor the segment characteristics of the L-th sequenced segment, it is assumed here that the 1 st video segment to be clipped among the K video segments to be clipped is the designated beginning segment. Dist represents a distance function.

In step S508, the segment to be sorted with the highest relevance score to the L-th sorted segment among the K-L segments to be sorted is determined as the L + 1-th sorted segment.

In step S510, add one to L and return to performing steps S506 to S510 described above until K-1 is reached, K sorted segments are obtained.

When L ═ K-1, the remaining 1 segment to be sorted can be directly used as the kth sorted segment, and then K sorted segments are obtained.

In step S512, the K sorted segments are synthesized according to the order of the K sorted segments, so as to obtain the video to be clipped.

In still another exemplary embodiment of the present disclosure, the video clipping method may include the following four steps: in the step 1, the image characteristics of each image frame in the video to be edited are obtained based on self-supervision learning; in the step 2, judging the video clipping mode, wherein the video clipping mode comprises clipping the image frame of the video to be clipped (step 3) and clipping the video segment in the video to be clipped (step 4); in step 3, clipping image frames in a video to be clipped; in step 4, a video segment in the video is clipped.

Specifically, in step 1, the features of each frame of image in the video are obtained based on the self-supervised learning. In order to perform full-automatic editing on video, the feature of each frame of image in video can be extracted by using an auto-supervision network, and a training flow chart of the auto-supervision network is shown in fig. 7.

The encoder can be constructed to extract the features of the image frames, the image frames are in the video to be edited, and the features of the image frames connected with the previous frame and the next frame are required to have high similarity. An RNN module may be introduced into the encoder to enable the extracted image features to have related information of previous and subsequent frames. The t frame image frame is I_tInputting the t-th frame image frame, and the preceding s-frame image frame and the following s-frame image frame into the encoder at the same time, outputting the image characteristics F of the t-th frame image frame_t∈Rⁿ。

Further, as shown in fig. 7, the self-supervised learning network may further include a decoder to supervise encoding to generate accurate features F_t. Image characteristic F of t-th image frame_tInput decoderOutput sum of_tY of the same size_tCalculating I_tAnd Y_tThe encoder can obtain the t frame image I through the self-supervision learning by taking the distance of (1) as the loss of the self-supervision network_tCharacteristic F of_t，F_tCombines the front and back frame information of the t frame image and can sufficiently express I_t。

Specifically, in step 2, for the video to be clipped, 2 clipping manners may be adopted: clipping image frames of a video to be clipped, and clipping video segments in the video to be clipped. Video duration T for video to be edited₁And target video duration T₂When is coming into contact with

When the first clipping mode (clipping image frames in the video to be clipped) is adopted, when the first clipping mode is used

And then, a second clipping mode (clipping the video segments in the video to be clipped) is adopted, and theta is more than 0 and less than or equal to 1.

When the video clip request includes the video segments to be clipped, the video segments can be sequenced and combined into a complete video (the combined video is the video to be clipped), and then the clipping mode is judged. For the ordering of a group of video segments to be edited, 2 ordering modes can be adopted. The first sorting mode: determining the sequence of the video segments to be clipped according to a user instruction; the second sort mode: the L sorted video segments to be clipped (i.e. the sorted segments) are determined according to the user instruction, and the next playing order is determined according to the algorithm, which can specifically refer to the embodiment shown in fig. 5.

For example, it can be recorded that K video segments to be edited are V respectively¹,V²,…,V^kWherein the a-th video segment V to be edited^aA segment played for the user-specified beginning. For the ith video segment V to be clippedⁱThe number of frames can be recorded as n, i.e.

Wherein

And representing the j frame image frame of the ith video segment to be clipped. Extraction using self-supervision network

Is characterized by

And calculates the video segment V to be editedⁱCharacteristic F ofⁱ。

For every two video segments V to be clippedⁱAnd V^jThe correlation score S of two video segments to be clipped can be obtained by calculating the distance of the features_ij。

S_ij＝-Dist(Fⁱ,F^j) (9)

Where Dist represents a distance function.

For a user-specified starting video segment V^aThe video segment to be edited with the highest relevance score can be used as the video segment to be played next. Then for any one video segment V to be editedⁱThe next played video clip V^j*In { V¹,V²,…,V^kSequence j in^*Is calculated in a manner that

j^*＝argmax S_ij (10)

j is 1, … k and j ≠ i

And sequencing a group of video segments to be clipped according to the above mode, synthesizing a complete video, and judging the clipping mode according to the video duration.

Specifically, in step 3, when

When the video is to be edited, the single-frame image frame in the video to be edited can be edited.

In step 1, each image frame I in the video to be edited is extracted_tImage feature F of_t. For the t-th image frame, the importance score s of the image frame can be calculated by utilizing the image characteristics of 2m image frames before and after the t-th image frame_tSee formula (1) and the related description.

Importance score s of the t-th image frame_tIt means that the importance score is larger when the distance between the image frame t and the nearby image is larger, and conversely, it means that the image frame t and the nearby image are similar, and the importance score is lower.

Recording the frame rate of the video to be cut as fps, and then the number of the video frames n needing to be deleted after cutting is equal to (T)₁-T₂)×fps。

In this way, the number n of video frames to be deleted is a relatively small value, i.e. the video can be completely edited on the basis of keeping all segments in the video, i.e. a segment is not deleted. To ensure that the n image frames deleted are not a complete segment (it is possible that the importance scores of all the frames in a segment are low), the image frames may be edited by using a frame deletion method: total frame number N ═ T of video to be edited₁X fps, calculation interval

I.e. each time

One picture with the lowest criticality score is selected from the frames to be deleted, and n frames are deleted in total.

Specifically, in step 4, when

When doing so, a video segment in the video to be clipped can be clipped.

Firstly, dividing a video to be edited into I video segments, for example, using a clustering method to take image frames belonging to the same category as one segment; or using difference, gradient and other methods to find the segmentation points of the segments, and dividing the video into different segments at the segmentation points.

Total frame number of video N ═ T₁Xfps, we split the video into I video segments { clip¹,…,clip^IRecording the ith video clipⁱNumber of frames of n_iI.e. by

Extracting clipⁱImage features of each image frame

And calculating an importance score of each image frame according to formula (1)

Clip will beⁱThe average of the importance scores of each image frame is the importance score of the current segment, i.e. the ith segment clipⁱSee equation (4) for the importance score of (1).

The higher the importance score of a video segment, the greater the picture change representing an image frame in the video segment, and the greater the highlight of the segment. The duration of the clipped video is T₂The number of frames is N' ═ T₂We will have K video clips { clip } fps¹,…,clip^IAccording to its importance score S¹,…,S^IGet after descending order sorting

With a corresponding importance score of

And is

The number of the sequenced fragment frames is

The first q video segments are selected as the remaining segments such that they satisfy equation (5).

Fig. 6(b) shows the clipping result of the video segment, the abscissa is the image frame in the video to be clipped, the ordinate is the importance score of the video segment, and the line segment represents the score of the video segment. Where the solid line segment is the retained video segment and the dashed line segment is the video segment to be deleted with a lower score, q being 10 in this example.

Splicing the selected q-1 video segments according to the sequence of the video segments in the original video to form a complete video v_qThe number of frames of the video is

When N is present_q>N', it may be necessary to take the video frame clipping method in step 3 for v_qClipping is carried out, the number of frames after clipping is from N_mTo become N'.

Further, FIG. 8 is a flowchart illustrating a video clipping method according to yet another exemplary embodiment. The video clipping method of the disclosed embodiment may include steps S802 to S814.

In step S802, a user input is received, and a video clip request is determined based on the information input by the user. The video clip request may include the complete video to clip and the target video duration T2, the video to clip having a video duration T1. In another embodiment, the video clip request may include a video segment to be clipped.

In step S804, it is determined whether the video is a complete video to be clipped, if so, S810 is executed, otherwise, S806 is executed.

In step S806, feature extraction is performed on image frames in the video segment to be clipped.

In step S808, the video segments to be clipped are sorted, and the video segments to be clipped are synthesized into a complete video to be clipped according to the sorting result.

In step S810, it is judged

If so, go to S812, otherwise go to S814.

In step S812, the image frames in the video to be clipped are clipped. See step 3 of the above example.

In step S814, a video segment in the video to be clipped is clipped. See step 4 of the above example.

The application provides a video editing method based on self-supervision learning, and after a user uploads a section of complete video to be edited or a group of video segments to be edited, the user only needs to specify the target video duration after the user desires to edit, and then the user can edit the video in a full-automatic mode. The whole clipping process does not need manual participation, and the intelligent clipping can be completed by reserving wonderful segments or frames in the video to be clipped through the image characteristics of each image frame and the calculation of the importance score.

It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.

Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments are implemented as a computer program executed by a Central Processing Unit (CPU). When executed by a central processing unit CPU, performs the above-described functions defined by the above-described methods provided by the present disclosure. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.

Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

FIG. 9 is a block diagram illustrating a video clipping device according to an example embodiment. Referring to fig. 9, a video clipping apparatus 90 provided by an embodiment of the present disclosure may include: a video acquisition module 902, a feature extraction module 904, an image frame positioning module 906, and a video clipping module 908.

In video clipping device 90, video acquisition module 902 may be configured to acquire a video having a video duration T₁And determining the target video duration T₂。

The feature extraction module 904 can be configured to perform feature extraction on the image frames of the video to be clipped, so as to obtain image features of the image frames.

The image frame positioning module 906 is configured to determine the video duration T of the video to be edited₁The target video duration T₂And determining image frames to be deleted in the image frames according to the image characteristics of the image frames.

The video clipping module 908 may be configured to delete the to-be-deleted image frames in the to-be-clipped video to clip the to-be-clipped video.

According to the video clipping device provided by the embodiment of the disclosure, the image frames of the video to be clipped are subjected to feature extraction to obtain the image features of the image frames, and the importance scores of different image frames or video segments can be considered based on the image features of the image frames; and the importance is considered based on the image characteristics of the image frames and the video duration T of the video to be edited₁The target video duration T₂Determining a certain number of image frames to be deleted in the image frames; and deleting the image frames to be deleted in the video to be edited so as to edit the video to be edited, so that the automatic and accurate editing of the video can be realized, and the consumption of labor and time cost is reduced.

In an exemplary embodiment, the feature extraction module 904 can be configured to: processing the t-s image frame to the t + s image frame in the video to be edited through an encoder to obtain the image characteristics of the t image frame in the video to be edited; wherein t is more than 0 and less than N, N is the total image frame number of the video to be clipped, and s is more than 0.

In an exemplary embodiment, the image frame positioning module 906 may include: a first image frame score calculating unit, configured to obtain an importance score s of the t-th image frame according to image features of the t-m-th to t + m-th image frames in the video to be clipped_tM is more than 0; a deletion frame number determination unit operable to determine the number of deleted frames based on the video duration T₁And the target video duration T₂Determining the number n of frames to be deleted according to the difference value and the frame rate of the video to be clipped; the interval dividing unit can be used for dividing the image frames of the video to be clipped into n intervals; a first deleted frame determining unit, configured to determine the image frame with the smallest importance score in each interval as the image frame to be deleted.

In an exemplary embodiment, the image frame score calculation unit may include:

In an exemplary embodiment, the video clipping device 90 may further include:

in an exemplary embodiment, the image frame positioning module 906 may include: the segment dividing unit can be used for dividing the video to be clipped into I video segments, wherein I is an integer larger than 1; a second image frame score calculating unit operable to determine an importance score of each image frame in the ith video segment; the segment score calculating unit is used for determining the importance score of the ith video segment according to the average value of the importance scores of the image frames in the ith video segment; the segment sorting unit is used for sorting the video segments in a descending order according to the importance scores of the video segments to obtain a sorting result; a reserved segment determining unit operable to determine a segment based onSaid video time length T₁The target video duration T₂Determining the first q video clips in the sequencing result as reserved clips according to the duration of each video clip; a first image frame determining unit, configured to determine the video duration of the first q video segments in the sorting result and the target video duration T₂Determining a first image frame in the first q video segments in the sequencing result according to the image characteristics of the image frames in the first q video segments in the sequencing result; and the second deleted frame determining unit may be configured to determine, as the image frame to be deleted, a first image frame in the first q video segments in the sorting result and image frames in the (q + 1) th to I-th video segments in the sorting result.

In an exemplary embodiment, the reserved fragment determination unit may include:

and is

Wherein, N ═ T₂Xfps, fps being the frame rate of the video to be clipped,

In an exemplary embodiment, the video acquisition module 902 may include: the video clip to be clipped acquiring unit can be used for acquiring K video clips to be clipped, wherein K is an integer greater than 2, the K video clips to be clipped comprise L sequenced clips and K-L segments to be sequenced, and L is greater than or equal to 1 and less than or equal to K-2; the segment characteristic determining unit can be used for determining the segment characteristic of each video segment to be clipped according to the image characteristic of the image frame in each video segment to be clipped; the segment correlation calculation unit can be used for determining the correlation scores of the L-th ordered segment and the K-L segments to be ordered according to the distance between the segment characteristics of the L-th ordered segment and the segment characteristics of the K-L segments to be ordered; the single fragment sorting unit can be used for determining the fragment to be sorted with the highest correlation score with the Lth sorted fragment in the K-L fragments to be sorted as the L +1 th sorted fragment; the segment sorting unit can be used for performing an addition operation on the L and returning to execute the steps until the L is equal to K-1, and K sorted segments are obtained; and the segment synthesis unit can be used for synthesizing the K sequenced segments according to the sequence of the K sequenced segments to obtain the video to be edited.

An electronic device 1000 according to this embodiment of the invention is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 10, the electronic device 1000 is in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, and a bus 1030 that couples various system components including the memory unit 1020 and the processing unit 1010.

Wherein the storage unit stores program code that is executable by the processing unit 1010 to cause the processing unit 1010 to perform steps according to various exemplary embodiments of the present invention as described in the "exemplary methods" section above in this specification. For example, the processing unit 1010 may perform the steps as shown in fig. 2 or fig. 3 or fig. 4 or fig. 5 or fig. 8.

The storage unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)10201 and/or a cache memory unit 10202, and may further include a read-only memory unit (ROM) 10203.

The memory unit 1020 may also include a program/utility 10204 having a set (at least one) of program modules 10205, such program modules 10205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.

The electronic device 1000 may also communicate with one or more external devices 1100 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary method" of this description, when said program product is run on said terminal device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A video clipping method, comprising:

obtaining a video duration T₁And determining the target video duration T₂；

Carrying out feature extraction on the image frame of the video to be edited to obtain the image features of the image frame;

according to the video duration T of the video to be edited₁The target video time length T₂And image features of the image frames determine image frames to be deleted in the image frames;

deleting the image frames to be deleted in the video to be clipped so as to clip the video to be clipped.

2. The method of claim 1, wherein feature extraction is performed on image frames of the video to be edited, and obtaining image features of the image frames comprises:

processing the t-s image frame to the t + s image frame in the video to be edited through an encoder to obtain the image characteristics of the t image frame in the video to be edited;

wherein t is more than 0 and less than N, N is the total image frame number of the video to be clipped, and s is more than 0.

3. The method of claim 1, wherein the video to be clipped is based on a video duration T of the video₁The target video duration T₂And the image characteristics of the image frames determining the image frames to be deleted in the image frames comprises:

obtaining the importance score s of the t image frame according to the image characteristics from the t-m image frame to the t + m image frame in the video to be edited_t，m＞0；

According to the video time length T₁And the target video duration T₂Determining the number n of frames to be deleted according to the difference value and the frame rate of the video to be clipped;

dividing the image frame of the video to be edited into n intervals;

and determining the image frame with the minimum importance score in each interval as the image frame to be deleted.

4. The method as claimed in claim 3, wherein the importance score s of the t image frame is obtained according to the image characteristics of the t-m image frame to the t + m image frame in the video to be edited_tThe method comprises the following steps:

5. The method of claim 4, wherein the method further comprises:

6. the method of claim 1, wherein the video to be clipped is based on a video duration T of the video₁The target video duration T₂And the image characteristics of the image frames determining the image frames to be deleted in the image frames comprises:

dividing the video to be clipped into I video segments, wherein I is an integer greater than 1;

determining an importance score of each image frame in the ith video segment;

determining the importance score of the ith video segment according to the average value of the importance scores of all the image frames in the ith video segment;

sorting the video clips in a descending order according to the importance scores of the video clips to obtain a sorting result;

according to the video time length T₁The target video duration T₂Determining the first q video clips in the sequencing result as reserved clips according to the duration of each video clip;

according to the video time lengths of the first q video segments in the sequencing result and the target video time length T₂Determining a first image frame in the first q video segments in the sequencing result according to the image characteristics of the image frames in the first q video segments in the sequencing result;

and determining a first image frame in the first q video segments in the sequencing result and image frames in the (q + 1) th to the (I) th video segments in the sequencing result as the image frame to be deleted.

7. The method of claim 6, wherein said duration T is based on said video duration₁The target video duration T₂Determining the first q video segments in the sequencing result as reserved segments according to the duration of each video segment comprises:

and is

Wherein N' ═ T₂Xfps, fps being the frame rate of the video to be clipped,

8. The method of claim 1, wherein obtaining the video to be clipped comprises:

acquiring K video segments to be edited, wherein K is an integer larger than 2, the K video segments to be edited comprise L sequenced segments and K-L segments to be sequenced, and L is larger than or equal to 1 and smaller than or equal to K-2;

determining the segment characteristics of each video segment to be clipped according to the image characteristics of the image frame in each video segment to be clipped;

determining the relevance scores of the L-th ordered segment and the K-L segments to be ordered according to the distance between the segment characteristics of the L-th ordered segment and the segment characteristics of the K-L segments to be ordered;

determining the segment to be sorted with the highest correlation score with the L-th sorted segment in the K-L segments to be sorted as the L + 1-th sorted segment;

adding one to L and returning to execute the steps until K-1 is reached to obtain K sequenced segments;

and synthesizing the K sequenced segments according to the sequence of the K sequenced segments to obtain the video to be edited.

9. A video clipping apparatus, comprising:

a video acquisition module for acquiring the video duration T₁And determining the target video duration T₂；

The characteristic extraction module is used for extracting the characteristics of the image frames of the video to be edited to obtain the image characteristics of the image frames;

an image frame positioning module for positioning the image frame according to the video duration T of the video to be clipped₁The target video time length T₂And determining an image frame to be deleted in the image frame according to the image characteristics of the image frame;

and the video clipping module is used for deleting the image frames to be deleted in the video to be clipped so as to clip the video to be clipped.

10. An electronic device, comprising:

at least one processor;

storage means for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-8.

11. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-8.