CN114666657B

CN114666657B - Video editing method and device, electronic equipment and storage medium

Info

Publication number: CN114666657B
Application number: CN202210269564.XA
Authority: CN
Inventors: 杨宜坚
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2024-03-19
Anticipated expiration: 2042-03-18
Also published as: CN114666657A

Abstract

The disclosure relates to a video editing method, a video editing device, electronic equipment and a storage medium, which can improve efficiency of the electronic equipment in editing a plurality of video clips to obtain complete videos. Comprising the following steps: acquiring a reference video and m target video clips, and extracting n reference video clips from the reference video; respectively determining the corresponding similarity between each target video segment in the m target video segments and each reference video segment in the n reference video segments; according to the similarity, determining p target video clips and p reference video clips with a one-to-one correspondence from m target video clips and n reference video clips; according to the arrangement sequence of the p reference video clips in the reference video, determining the arrangement sequence of p target video clips corresponding to the p reference video clips, and clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain the target video.

Description

Video editing method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of network technologies, and in particular, to a video editing method, a video editing device, electronic equipment and a storage medium.

Background

Currently, in daily life, a user object can record pictures in life by shooting short videos (video clips), and after a plurality of short video materials are shot, video clips are spliced on the plurality of short video materials through a video clip application program to obtain a micro log (vlog), so that the obtained vlog can be distributed to a short video platform for sharing.

However, in the above method, after the user object shoots to obtain a plurality of short video materials, the user object needs to manually clip the video clips to obtain the spliced vlog. In this case, the user object needs to have a certain video editing experience, and for some user objects without video editing experience, it is not known how to select suitable video clips and combine them in what order, so that a vlog with better editing effect cannot be obtained. Therefore, the electronic equipment clips a plurality of video clips, and the efficiency of obtaining the vlog is low.

Disclosure of Invention

The disclosure provides a video editing method, a video editing device, electronic equipment and a storage medium, which can improve efficiency of the electronic equipment in editing a plurality of video clips to obtain complete videos. The technical scheme of the present disclosure is as follows:

According to a first aspect of the present disclosure, there is provided a video editing method, the method comprising: acquiring a reference video and m target video clips, and extracting n reference video clips from the reference video; m and n are positive integers; respectively determining the corresponding similarity between each target video segment in the m target video segments and each reference video segment in the n reference video segments; according to the similarity, determining p target video clips and p reference video clips with a one-to-one correspondence from m target video clips and n reference video clips; one of the p target video clips corresponds to one of the p reference video clips, and p is the minimum value of m and n; according to the arrangement sequence of the p reference video clips in the reference video, determining the arrangement sequence of p target video clips corresponding to the p reference video clips, and clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain the target video.

As can be seen from the above, in the case that a plurality of video clips need to be edited to synthesize to obtain a synthesized video, n reference video clips can be obtained by obtaining a reference video and m target video clips to be synthesized and extracting n reference video clips from the reference video; further, determining a corresponding similarity between each of the m target video clips and each of the n reference video clips; according to the similarity, determining p target video clips and p reference video clips with one-to-one correspondence from m target video clips and n reference video clips; and finally, determining the arrangement sequence of p target video clips corresponding to the p reference video clips according to the arrangement sequence of the p reference video clips in the reference video, and clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain the target video. According to the implementation mode, the electronic equipment can automatically clip the acquired reference video and m target video clips to be synthesized to obtain the synthesized video, and the electronic equipment can be triggered to clip the video clips to obtain the spliced synthesized video without gradually and manually clipping the video clips by a user object. Therefore, the efficiency of editing the plurality of video clips by the electronic equipment to obtain the synthesized video can be improved.

Optionally, before the method of extracting n reference video segments from the reference video, the method specifically further includes: adding n marks in the reference video according to video contents included in the n reference video fragments; each mark is used for marking the starting position of a reference video segment in the reference video; the method for extracting n reference video clips from the reference video specifically includes: and extracting n reference video fragments from the reference video according to the n marks.

From the above, n marks may be added to the reference video according to the video content included in the n reference video clips, so as to indicate the starting position of one reference video clip in the reference video by the n marks, so that the n reference video clips may be extracted from the reference video according to the n marks. Through the implementation mode, n reference video clips can be accurately extracted from the reference video.

Optionally, the method for determining the similarity between each of the m target video segments and each of the n reference video segments specifically includes: extracting multi-frame target images from each video segment aiming at each video segment in m target video segments and n reference video segments to obtain an image set corresponding to each video segment; clustering all target images included in a plurality of image sets corresponding to the m target video clips and the n reference video clips to obtain a plurality of image categories; determining the ratio of the number of images corresponding to each image category to the total number of images corresponding to each image set, wherein the ratio is included in each image set; and determining a feature vector corresponding to each image set according to a plurality of ratios corresponding to a plurality of image categories included in each image set, and determining the similarity corresponding to each target video segment in the m target video segments and each reference video segment in the n reference video segments according to the feature vector corresponding to each image set.

From the above, multiple frames of target images can be extracted from each of m target video clips and n reference video clips, so as to obtain an image set corresponding to each video clip, and further clustering all target images included in all image sets to obtain multiple image categories; and further determining the ratio of the number of images corresponding to each image category to the total number of images corresponding to the image set, wherein the ratio is included in each image set, and determining the feature vector corresponding to each image set according to the ratio, so as to further determine the corresponding similarity between each target video segment in the m target video segments and each reference video segment in the n reference video segments according to the feature vector corresponding to each image set. By the implementation mode, the corresponding similarity between each target video segment in the m target video segments and each reference video segment in the n reference video segments can be accurately determined.

Optionally, the method for determining p target video clips and p reference video clips with a one-to-one correspondence from m target video clips and n reference video clips according to the similarity specifically includes: determining a first highest similarity from the similarities, and determining that a corresponding relationship exists between a first target video segment corresponding to the first highest similarity and a first reference video segment; the first target video segments and the first reference video segments are video segments in m target video segments and n reference video segments respectively; deleting all the similarities corresponding to the first target video segment and the first reference video segment from the similarities to obtain a first residual similarity; determining a second highest similarity from the first residual similarity, and determining that a corresponding relationship exists between a second target video segment corresponding to the second highest similarity and a second reference video segment; deleting all the similarities corresponding to the second target video segment and the second reference video segment from the first residual similarity to obtain a second residual similarity; and circularly executing the step until p target video clips and p reference video clips with one-to-one correspondence are determined.

From the above, it can be known that the remaining similarity can be obtained by determining the highest similarity from the similarities, determining that there is a correspondence between the first target video segment corresponding to the highest similarity and the first reference video segment, and deleting all the similarities corresponding to the first target video segment and the first reference video segment from the similarities; further, determining the highest similarity again from the remaining similarities, determining that a corresponding relationship exists between the second target video segment corresponding to the highest similarity and the second reference video segment again, and deleting all the similarities corresponding to the second target video segment and the second reference video segment from the remaining similarities again to obtain further remaining similarities; by circularly performing these steps, p target video clips and p reference video clips having a one-to-one correspondence can be determined. Through the implementation manner, p target video clips and p reference video clips with one-to-one correspondence can be accurately determined.

Optionally, the highest similarity includes a first sub-similarity and a second sub-similarity, where the first sub-similarity is equal to the second sub-similarity, the first sub-similarity is a similarity corresponding to the first target video segment and the first reference video segment, and the second sub-similarity is a similarity corresponding to the first target video segment and the third reference video segment; the method for determining that the corresponding relation exists between the first target video segment corresponding to the first highest similarity and the first reference video segment specifically includes: and under the condition that the second high similarity corresponding to the first reference video segment is smaller than the second high similarity corresponding to the third reference video segment, determining that a corresponding relationship exists between the first target video segment corresponding to the first sub-similarity and the first reference video segment.

As can be seen from the above, when the highest similarity includes the first sub-similarity and the second sub-similarity, and the first sub-similarity and the second sub-similarity are equal, the magnitude relation between the second high similarity corresponding to the first reference video segment and the second high similarity corresponding to the third reference video segment can be further determined, so that when the second high similarity corresponding to the first reference video segment is determined to be smaller than the second high similarity corresponding to the third reference video segment, the correspondence between the first target video segment corresponding to the first sub-similarity and the first reference video segment is determined. By means of the implementation mode, in the case that the highest similarity comprises two equal similarities, one target video segment and one reference video segment with corresponding relations can be accurately determined.

Optionally, the method for clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain the target video specifically includes: the method comprises the steps that when the duration of any one of p target video clips is longer than the target duration of a reference video clip corresponding to any one of the p target video clips, clipping is conducted on any one of the target video clips, a part of clip with highest similarity with the corresponding reference video clip is selected from any one of the target video clips, and the duration of any one of the target video clips is adjusted to be the target duration; and editing the p target video clips with the video duration adjusted based on the arrangement sequence of the p target video clips to obtain the target video.

From the above, when the duration of any one of the p target video clips is longer than the target duration of the reference video clip corresponding to any one of the p target video clips, clipping can be performed on any one of the target video clips to select a portion clip with the highest similarity with the corresponding reference video clip from any one of the target video clips, so as to achieve the effect of adjusting the duration of any one of the target video clips to the target duration, and clipping the p target video clips with the adjusted video duration based on the arrangement sequence of the p target video clips, thereby obtaining the target video. By the implementation mode, the method for adjusting the time length of the target video clips is provided, so that the clipping processing of a plurality of target video clips can be improved, and the effect of the target video is obtained.

Optionally, the method for obtaining the target video by clipping the p target video clips with the adjusted video duration based on the arrangement sequence of the p target video clips specifically includes: sequencing the p target video clips with the video duration adjusted based on the sequence of the p target video clips, and inserting a transition special effect between every two adjacent target video clips in the sequenced p target video clips; and carrying out video synthesis processing on the p sequenced target video clips and the transition special effects to obtain target videos.

From the above, a transition special effect can be inserted between every two adjacent target video clips in the p sequenced target video clips, so that video synthesis processing is performed on the p sequenced target video clips and the transition special effect to obtain a target video. By the implementation mode, the effect of editing a plurality of target video clips can be further improved by inserting a transition special effect between every two adjacent target video clips, and the effect of the target video is obtained.

Optionally, the method for acquiring the reference video and the m target video clips and extracting n reference video clips from the reference video specifically includes: acquiring a reference video from a template library according to the selection operation of the user object; the template library comprises a plurality of reference videos, wherein the plurality of reference videos are videos formed by a plurality of video clips, and the plurality of reference videos comprise videos with user objects added into the template library; according to the input operation of the user object, m target video clips are obtained from the electronic equipment; the m target video clips are video clips shot by the electronic equipment.

From the above, the user object may add a plurality of reference videos composed of a plurality of video clips in the template library in advance, so as to trigger the electronic device to directly obtain the reference videos from the template library including the plurality of reference videos when the m target video clips need to be clipped; and after m target video clips shot by the electronic equipment are acquired from the electronic equipment according to the input operation of the user object, clipping the m target video clips. Through the implementation mode, the template library can be established in advance, so that when the editing processing is required to be carried out on m target video clips, reference videos can be directly obtained from the template library, and the efficiency of the electronic equipment for editing the video clips is improved, and the synthesized video is obtained.

According to a second aspect of the present disclosure, there is provided a video editing apparatus comprising: the device comprises an acquisition unit, a determination unit and a processing unit; an acquisition unit configured to perform acquisition of a reference video and m target video clips, and extract n reference video clips from the reference video; m and n are positive integers; a determining unit configured to perform determining a corresponding similarity between each of the m target video clips and each of the n reference video clips, respectively; the determining unit is further configured to determine p target video clips and p reference video clips with one-to-one correspondence from the m target video clips and the n reference video clips according to the similarity; one of the p target video clips corresponds to one of the p reference video clips, and p is the minimum value of m and n; a determining unit configured to perform determining an arrangement order of p target video clips corresponding to the p reference video clips according to an arrangement order of the p reference video clips in the reference video; and the processing unit is configured to execute clipping processing on the p target video clips based on the arrangement sequence of the p target video clips to obtain target videos.

Optionally, the processing unit is configured to execute video content included according to the n reference video clips, and adds n marks in the reference video; each mark is used for marking the starting position of a reference video segment in the reference video; and the acquisition unit is configured to perform extraction of n reference video fragments from the reference video according to the n marks.

Optionally, the acquiring unit is configured to execute the steps of extracting multi-frame target images from each video segment for each video segment in the m target video segments and the n reference video segments, and obtaining an image set corresponding to each video segment; the processing unit is configured to perform clustering processing on all target images included in a plurality of image sets corresponding to the m target video clips and the n reference video clips to obtain a plurality of image categories; a determining unit configured to perform determining a ratio of the number of images corresponding to each image category to the total number of images corresponding to the image set, which is included in each image set; and the determining unit is configured to determine the feature vector corresponding to each image set according to a plurality of ratios corresponding to a plurality of image categories included in each image set, and determine the corresponding similarity between each target video segment in the m target video segments and each reference video segment in the n reference video segments according to the feature vector corresponding to each image set.

Optionally, the determining unit is configured to determine a first highest similarity from the similarities, and determine that a corresponding relationship exists between the first target video segment corresponding to the first highest similarity and the first reference video segment; the first target video segments and the first reference video segments are video segments in m target video segments and n reference video segments respectively; the processing unit is configured to delete all the similarities corresponding to the first target video segment and the first reference video segment from the similarities to obtain a first residual similarity; the processing unit is configured to determine a second highest similarity from the first residual similarity, and determine that a corresponding relationship exists between a second target video segment corresponding to the second highest similarity and a second reference video segment; deleting all the similarities corresponding to the second target video segment and the second reference video segment from the first residual similarity to obtain a second residual similarity; and circularly executing the step until p target video clips and p reference video clips with one-to-one correspondence are determined.

Optionally, the highest similarity includes a first sub-similarity and a second sub-similarity, where the first sub-similarity is equal to the second sub-similarity, the first sub-similarity is a similarity corresponding to the first target video segment and the first reference video segment, and the second sub-similarity is a similarity corresponding to the first target video segment and the third reference video segment; and the determining unit is configured to determine that a corresponding relationship exists between the first target video segment corresponding to the first sub-similarity and the first reference video segment when the second high similarity corresponding to the first reference video segment is smaller than the second high similarity corresponding to the third reference video segment.

Optionally, the processing unit is configured to execute the duration of any one of the p target video clips, which is longer than the target duration of the reference video clip corresponding to any one of the p target video clips, clip any one of the target video clips, select a portion clip with the highest similarity with the corresponding reference video clip from any one of the target video clips, and adjust the duration of any one of the target video clips to be the target duration; the processing unit is configured to execute clipping processing on the p target video clips with the video duration adjusted based on the arrangement sequence of the p target video clips, so as to obtain a target video.

Optionally, the processing unit is configured to execute sorting of the p target video clips with the adjusted video duration based on the arrangement sequence of the p target video clips, and insert a transition special effect between every two adjacent target video clips in the p sorted target video clips; and the processing unit is configured to perform video synthesis processing on the p sequenced target video clips and the transition special effects to obtain target videos.

Optionally, the acquiring unit is configured to acquire the reference video from the template library according to the selection operation of the user object; the template library comprises a plurality of reference videos, wherein the plurality of reference videos are videos formed by a plurality of video clips, and the plurality of reference videos comprise videos with user objects added into the template library; an acquisition unit configured to perform an input operation according to a user object, acquiring m target video clips from an electronic device; the m target video clips are video clips shot by the electronic equipment.

According to a third aspect of the present disclosure, there is provided an electronic device, comprising:

a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute instructions to implement any of the optional video clip methods of the first aspect described above.

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having instructions stored thereon which, when executed by a processor of an electronic device, enable the electronic device to perform any one of the above-described optional video editing methods of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising instructions which, when executed by a processor of an electronic device, implement the optional video editing method as in any of the first aspects.

According to a sixth aspect of the present disclosure there is provided a chip comprising a processor and a communication interface, the communication interface and the processor being coupled, the processor being for running a computer program or instructions to implement the optional video editing method as in any of the first aspects.

The technical scheme provided by the disclosure at least brings the following beneficial effects:

Based on any one of the above aspects, in the present disclosure, in a case where a plurality of video clips need to be edited to be synthesized to obtain one synthesized video, n reference video clips may be obtained by obtaining a reference video and m target video clips to be synthesized, and extracting n reference video clips from the reference video; further, determining a corresponding similarity between each of the m target video clips and each of the n reference video clips; according to the similarity, determining p target video clips and p reference video clips with one-to-one correspondence from m target video clips and n reference video clips; and finally, determining the arrangement sequence of p target video clips corresponding to the p reference video clips according to the arrangement sequence of the p reference video clips in the reference video, and clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain the target video. According to the implementation mode, the electronic equipment can automatically clip the acquired reference video and m target video clips to be synthesized to obtain the synthesized video, and the electronic equipment can be triggered to clip the video clips to obtain the spliced synthesized video without gradually and manually clipping the video clips by a user object. Therefore, the efficiency of editing the plurality of video clips by the electronic equipment to obtain the synthesized video can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of a video clip system according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram illustrating a video editing method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart diagram illustrating another video editing method according to an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating yet another video editing method according to an embodiment of the present disclosure;

FIG. 5 is a flow chart illustrating yet another video editing method according to an embodiment of the present disclosure;

FIG. 6 is a flow chart illustrating yet another video editing method according to an embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating yet another video editing method according to an embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating yet another video editing method according to an embodiment of the present disclosure;

FIG. 9 is a flow chart diagram illustrating yet another video editing method according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a video clip apparatus according to an embodiment of the disclosure;

fig. 11 is a schematic structural view of another video clip apparatus according to an embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

First, an application scenario of the embodiment of the present disclosure will be described. In the prior art, when a user object shoots through an electronic device to obtain a plurality of video material fragments and needs to clip the plurality of video material fragments to obtain a section of vlog, the user object may not know how to select suitable fragments and combine the fragments in what order, and is limited by the clipping level of the user object, and the user object cannot clip the plurality of video material fragments through a video clipping application program according to the own thought; even if a user object references some quality vlogs (other user objects have already clipped the processed vlogs), the user object is required to operate by itself and is not necessarily able to pick the appropriate segment from the multiple video material segments for clipping. In the prior art, the electronic device clips the plurality of video material fragments to obtain the composite video, and for the user object without video clipping experience, the use experience is poor, so that the efficiency of the electronic device for clipping the plurality of video fragments to obtain the complete video is low.

In order to solve the above-mentioned problems, an embodiment of the present disclosure provides a video editing method, where in a case where a plurality of video clips need to be edited to be synthesized to obtain one synthesized video, n reference video clips may be obtained by obtaining a reference video and m target video clips to be synthesized, and extracting n reference video clips from the reference video; further, determining a corresponding similarity between each of the m target video clips and each of the n reference video clips; according to the similarity, determining p target video clips and p reference video clips with one-to-one correspondence from m target video clips and n reference video clips; and finally, determining the arrangement sequence of p target video clips corresponding to the p reference video clips according to the arrangement sequence of the p reference video clips in the reference video, and clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain the target video. According to the implementation mode, the electronic equipment can automatically clip the acquired reference video and m target video clips to be synthesized to obtain the synthesized video, and the electronic equipment can be triggered to clip the video clips to obtain the spliced synthesized video without gradually and manually clipping the video clips by a user object. Therefore, the efficiency of editing the plurality of video clips by the electronic equipment to obtain the synthesized video can be improved.

The following describes an exemplary content display method provided by the embodiments of the present disclosure with reference to the accompanying drawings:

fig. 1 is a schematic diagram of a video clip system provided in an embodiment of the present disclosure, as shown in fig. 1, where the video clip system may include a server 11 and clients 12 (only one client 12 is shown in fig. 1 by way of example, and there may be more clients in implementation). Wherein a communication connection may be established between the server 11 and the client 12. The server 11 and the client 12 may be connected in a wired manner or may be connected in a wireless manner, which is not limited in the embodiment of the present disclosure.

The server 11 is configured to interact with the client 12, and may perform clipping processing on a plurality of video clips sent by the client 12.

A client 12 for editing a plurality of video clips by a video editing application program and performing data interaction with the server 11.

In one implementation, the server 11 may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center. The server 11 may include a processor, memory, a network interface, and the like.

In one implementation, the client 12 is used to provide voice and/or data connectivity services to a user. The client 12 may have different names such as UE side, terminal unit, terminal station, mobile station, remote terminal, mobile device, wireless communication device, vehicle user equipment, terminal agent or terminal equipment, etc.

Alternatively, the client 12 may be a handheld device, an in-vehicle device, a wearable device, or a computer with communication functions, which is not limited in any way by the embodiments of the present disclosure. For example, the handheld device may be a smart phone. The in-vehicle device may be an in-vehicle navigation system. The wearable device may be a smart bracelet. The computer may be a personal digital assistant (personal digital assistant, PDA) computer, a tablet computer, or a laptop computer (laptop computer).

The video clip method provided by the embodiment of the present disclosure may be applied to the server 11 and the client 12 in the video clip system shown in fig. 1 described above. The electronic device to which the present disclosure relates may be the server 11 or the client 12. Taking an example that the video clipping method of the present disclosure is applied to a server in a process of executing a service, the video clipping method provided by the embodiment of the present disclosure will be described in detail.

Having described the application scenario and video editing system of the embodiments of the present disclosure, a detailed description of the video editing method provided by the embodiments of the present disclosure is provided below in conjunction with the video editing system shown in fig. 1.

As shown in fig. 2, a flowchart of a video clip method according to an exemplary embodiment is shown as applied to an electronic device. The video clip method may include S201-S204.

S201, acquiring a reference video and m target video clips, and extracting n reference video clips from the reference video.

Wherein m and n are positive integers.

In the embodiment of the disclosure, when a user object needs to perform video clipping processing on a plurality of video clips (i.e., m target video clips) to obtain a composite video, the plurality of video clips may be clipped to obtain the composite video with the desired video effect by obtaining the composite video (i.e., the reference video) issued by one other user object on the video platform as a reference.

Optionally, the reference video may be a composite video published by another user object that is pre-collected by the user object in the video platform, or may be a composite video included in a template library in the server (or a composite video stored in the electronic device).

It can be understood that the above-mentioned reference video may be a user object with a strong video editing capability, and a composite video obtained by manually editing a plurality of video clips; therefore, some user objects with poor video editing capability can automatically clip m target video clips by taking the reference video as a reference to obtain the composite video with the same video effect through the embodiment of the present disclosure.

Optionally, the m target video clips may be video clips obtained by shooting the user object through the camera application program, or may be video clips downloaded by the user object from a network.

Optionally, the electronic device may analyze the reference video to determine n reference video segments included in the reference video, and clip the reference video to split the n reference video segments.

S202, respectively determining the corresponding similarity between each target video segment in the m target video segments and each reference video segment in the n reference video segments.

Optionally, the electronic device may analyze each target video segment of the m target video segments and each frame image included in each reference video segment of the n reference video segments through an image analysis processing technology, so as to determine video types corresponding to the m target video segments and each video segment of the n reference video segments, so as to determine a corresponding similarity between each target video segment and each reference video segment according to the video type corresponding to each video segment.

It can be appreciated that m×n similarities can be determined between m target video clips and n reference video clips.

S203, according to the similarity, determining p target video clips and p reference video clips with one-to-one correspondence from m target video clips and n reference video clips.

Wherein one of the p target video clips corresponds to one of the p reference video clips, and p is the minimum value of m and n.

Optionally, the electronic device may sequentially determine p target video segments and p reference video segments having a one-to-one correspondence according to the determined magnitude relationship between each of the m target video segments and the corresponding multiple similarities between each of the n reference video segments.

It can be appreciated that in the case where m is greater than n, n target video clips having a one-to-one correspondence with n reference video clips (i.e., p is equal to n) can be determined from the m target video clips; when m is smaller than n, determining m reference video clips (i.e. p is equal to m) with a one-to-one correspondence with the m reference video clips from the n reference video clips; in the case where m is equal to n, the one-to-one correspondence between m target video segments and n reference video segments (i.e., p is equal to m) may be directly determined.

S204, according to the arrangement sequence of the p reference video clips in the reference video, determining the arrangement sequence of p target video clips corresponding to the p reference video clips, and clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain the target video.

Alternatively, the arrangement order of the p reference video clips in the reference video may be understood as the play order corresponding to the p reference video clips when playing the reference video.

Since the step has determined the one-to-one correspondence between the p target video clips and the p reference video clips, after determining the arrangement order of the p reference video clips, the arrangement order of the p target video clips can be determined according to the one-to-one correspondence between the p target video clips and the p reference video clips.

Therefore, the p target video clips can be clipped according to the arrangement sequence of the p target video clips, so that when the obtained target video is played, the p target video clips included in the target video can be sequentially played according to the arrangement sequence of the p target video clips.

The technical scheme provided by the embodiment at least brings the following beneficial effects: under the condition that a plurality of video clips are required to be edited so as to be synthesized to obtain a synthesized video, n reference video clips can be obtained by acquiring a reference video and m target video clips to be synthesized and extracting the reference video clips from the reference video; further, determining a corresponding similarity between each of the m target video clips and each of the n reference video clips; according to the similarity, determining p target video clips and p reference video clips with one-to-one correspondence from m target video clips and n reference video clips; and finally, determining the arrangement sequence of p target video clips corresponding to the p reference video clips according to the arrangement sequence of the p reference video clips in the reference video, and clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain the target video. According to the implementation mode, the electronic equipment can automatically clip the acquired reference video and m target video clips to be synthesized to obtain the synthesized video, and the electronic equipment can be triggered to clip the video clips to obtain the spliced synthesized video without gradually and manually clipping the video clips by a user object. Therefore, the efficiency of editing the plurality of video clips by the electronic equipment to obtain the synthesized video can be improved.

In an embodiment, in conjunction with fig. 2, as shown in fig. 3, before the method of extracting n reference video segments from the reference video in S201, S301 may specifically be further included; the method of extracting n reference video clips from the reference video in S201 may specifically include S2011.

S301, adding n marks in the reference video according to video contents included in the n reference video clips.

Wherein each marker is used to mark the starting position of a reference video segment in the reference video.

Optionally, the electronic device may perform video analysis on the reference video to divide the reference video into n reference video segments according to the specific content displayed in each frame of image.

Further, the electronic device may further add a mark between each adjacent two of n reference video clips included in the reference video, so that a start position of each reference video clip in the reference video may be indicated by the n marks.

The marking positions corresponding to the n marks may be marking positions determined by the electronic device according to video analysis performed on the reference video; or, the mark positions corresponding to the n marks may be mark positions determined by the user object according to subjective judgment, and the electronic device is triggered by manual operation to add the n marks in the reference video.

S2011, extracting n reference video clips from the reference video according to the n marks.

Optionally, the electronic device may split the reference video clip into n reference video segments according to n marks included in the reference video.

The technical scheme provided by the embodiment at least brings the following beneficial effects: n marks may be added to the reference video according to video contents included in the n reference video pieces to indicate a start position of one reference video piece in the reference video through the n marks, so that the n reference video pieces may be extracted from the reference video according to the n marks. Through the implementation mode, n reference video clips can be accurately extracted from the reference video.

In an embodiment, as shown in fig. 4 in conjunction with fig. 2, the method in S202 may specifically include S401-S404.

S401, extracting multi-frame target images from each video segment aiming at each video segment in m target video segments and n reference video segments to obtain an image set corresponding to each video segment.

Alternatively, for each video segment, a corresponding frame image may be acquired from the video segment based on a total duration of the video segment and a preset duration (for example, 0.5 s) at each interval, so as to determine the acquired multi-frame target image as the image set corresponding to the one video segment.

Optionally, according to the content displayed in each frame image included in each video clip, a multi-frame target image including important display content is determined from all frame images corresponding to each video clip, so as to obtain an image set corresponding to the video clip.

S402, clustering all target images included in a plurality of image sets corresponding to m target video clips and n reference video clips to obtain a plurality of image categories.

Optionally, after obtaining a plurality of image sets corresponding to each video segment in the m target video segments and the n reference video segments, clustering all target images included in the plurality of image sets by using an image clustering algorithm to obtain a plurality of image categories.

It should be noted that, the number of categories corresponding to the plurality of image categories may be a numerical value determined by the user object, and the number of categories corresponding to the plurality of image categories may be flexibly adjusted according to subjective intention of the user object.

S403, determining the ratio of the number of images corresponding to each image category to the total number of images corresponding to each image set.

Optionally, after clustering all the target images included in the multiple image sets to obtain multiple image categories, further determining the number of images corresponding to each image category included in each image set respectively is needed.

Therefore, the ratio of the number of images corresponding to each image category to the total number of images corresponding to the image category in each image set can be calculated to determine the duty ratio of the number of images corresponding to each image category in each image set.

S404, determining a feature vector corresponding to each image set according to a plurality of ratios corresponding to a plurality of image categories included in each image set, and determining a similarity corresponding to each target video segment in the m target video segments and each reference video segment in the n reference video segments according to the feature vector corresponding to each image set.

Optionally, after determining the duty ratio of the number of images corresponding to each image category in each image set, the feature vector corresponding to the image set may be determined according to the duty ratio of the number of images corresponding to each image category in the image set.

Further, after the feature vector corresponding to each image set is determined, a corresponding similarity between each of the m target video segments and each of the n reference video segments may be obtained by calculating the feature vector corresponding to each image set.

For example, after all target images included in the multiple image sets are clustered, all the images are classified into 6 image categories, and images of 4 image categories are included in a certain video segment, and the number of images corresponding to each image category is 10, 20, 30 and 40 respectively, then the feature vector corresponding to the certain video segment is: and the same applies to the step 0.1,0.2,0.3,0.4,0,0, the feature vector corresponding to each of the m target video segments and the n reference video segments can be determined. Further, for one target video segment of the m target video segments and one reference video segment of the n reference video segments, the euclidean distance between the two feature vectors can be calculated according to the feature vectors corresponding to the two video segments, and the reciprocal of the euclidean distance is determined as the corresponding similarity between the two video segments.

For example, the feature vector corresponding to a target video segment is: [0.1,0.2,0.3,0.4,0,0], the feature vector corresponding to a reference video segment is: [0.1,0.2,0.4,0.4,0,0], the Euclidean distance was calculated to be 0.1, and the similarity was calculated to be 10.

The technical scheme provided by the embodiment at least brings the following beneficial effects: the multi-frame target images can be respectively extracted from each video segment in the m target video segments and the n reference video segments to obtain an image set corresponding to each video segment, and all target images included in all image sets are further clustered to obtain a plurality of image categories; and further determining the ratio of the number of images corresponding to each image category to the total number of images corresponding to the image set, wherein the ratio is included in each image set, and determining the feature vector corresponding to each image set according to the ratio, so as to further determine the corresponding similarity between each target video segment in the m target video segments and each reference video segment in the n reference video segments according to the feature vector corresponding to each image set. By the implementation mode, the corresponding similarity between each target video segment in the m target video segments and each reference video segment in the n reference video segments can be accurately determined.

In an embodiment, as shown in fig. 5 in connection with fig. 2, the method in S203 may specifically include S501-S503.

S501, determining a first highest similarity from the similarities, and determining that a corresponding relationship exists between a first target video segment corresponding to the first highest similarity and a first reference video segment.

The first target video segments and the first reference video segments are video segments in m target video segments and n reference video segments respectively.

For example, assume that there are a total of 5 target video segments (i.e., m is 5), 4 reference video segments (i.e., n is 4), and each of the 5 target video segments is determined by the steps described above, with the corresponding similarity between each of the 4 reference video segments being as shown in table one below:

list one

Similarity degree	Reference video clip 1	Reference video clip 2	Reference video clip 3	Reference video clip 4
					Target video clip 1	3.33	5	10	2.5
Target video clip 2	2.5	3.33	1.67	8
					Target video clip 3	3.33	2	2.5	8
Target video clip 4	1.25	1.11	5	2
					Target video clip 5	5	2	1.43	2.5

As can be seen from table one, the highest similarity is 10 and is the only highest similarity, and it can be determined that there is a correspondence between the target video segment 1 and the reference video segment 3 corresponding to the highest similarity.

S502, deleting all the similarities corresponding to the first target video segment and the first reference video segment from the similarities to obtain a first residual similarity.

Further, all the similarities corresponding to the target video segment 1 (i.e., the first target video segment) and the reference video segment 3 (i.e., the first reference video segment) are deleted from the first table, i.e., the row corresponding to the target video segment 1 and the column corresponding to the reference video segment 3 are deleted, so as to obtain a first residual similarity, as shown in the second table:

watch II

Similarity degree	Reference video clip 1	Reference video clip 2	Reference video clip 4
				Target video clip 2	2.5	3.33	8
Target video clip 3	3.33	2	8
				Target video clip 4	1.25	1.11	2
Target video clip 5	5	2	2.5

S503, determining a second highest similarity from the first residual similarity, and determining that a corresponding relationship exists between a second target video segment corresponding to the second highest similarity and a second reference video segment; deleting all the similarities corresponding to the second target video segment and the second reference video segment from the first residual similarity to obtain a second residual similarity; and circularly executing the step until p target video clips and p reference video clips with one-to-one correspondence are determined.

Optionally, a highest similarity is determined again according to the first remaining similarity shown in table two, and it is determined that a correspondence exists between the second target video segment (e.g., the target video segment 3) and the second reference video segment (the reference video segment 4) corresponding to the highest similarity.

Further, all the similarities corresponding to the second target video segment (e.g., target video segment 3) and the second reference video segment (reference video segment 4) in the second table are deleted again, so as to obtain a second residual similarity.

And circularly executing the steps to determine 4 target video clips and 4 reference video clips with one-to-one correspondence.

The technical scheme provided by the embodiment at least brings the following beneficial effects: the method comprises the steps of determining the highest similarity from the similarities, determining that a corresponding relation exists between a first target video segment corresponding to the highest similarity and a first reference video segment, and deleting all the similarities corresponding to the first target video segment and the first reference video segment from the similarities to obtain the rest similarities; further, determining the highest similarity again from the remaining similarities, determining that a corresponding relationship exists between the second target video segment corresponding to the highest similarity and the second reference video segment again, and deleting all the similarities corresponding to the second target video segment and the second reference video segment from the remaining similarities again to obtain further remaining similarities; by circularly performing these steps, p target video clips and p reference video clips having a one-to-one correspondence can be determined. Through the implementation manner, p target video clips and p reference video clips with one-to-one correspondence can be accurately determined.

In one embodiment, the highest similarity includes a first sub-similarity and a second sub-similarity, where the first sub-similarity is equal to the second sub-similarity, the first sub-similarity is a similarity corresponding to the first target video segment and the first reference video segment, and the second sub-similarity is a similarity corresponding to the first target video segment and the third reference video segment; referring to fig. 5, as shown in fig. 6, the method of determining that a correspondence exists between the first target video segment corresponding to the first highest similarity and the first reference video segment in S501 may specifically include S5011.

S5011, under the condition that the second high similarity corresponding to the first reference video segment is smaller than the second high similarity corresponding to the third reference video segment, determining that a corresponding relationship exists between the first target video segment corresponding to the first sub-similarity and the first reference video segment.

Optionally, when there are two equal highest similarities, it is necessary to further determine a size relationship between the second high similarity corresponding to the first reference video segment and the second high similarity corresponding to the third reference video segment, so as to determine a reference video segment having a corresponding relationship with the first target video segment.

For example, as shown in table two, the highest similarity is 8, but there are two highest similarities of 8, namely, the similarity corresponding to the target video segment 2 and the reference video segment 4, and the similarity corresponding to the target video segment 3 and the reference video segment 4 are both the highest similarities; the second high similarity 5 corresponding to the target video segment 2 and the reference video segment 4 and the second high similarity 3.33 corresponding to the target video segment 3 and the reference video segment 4 can be further determined, and the target video segment 3 corresponding to the second high similarity smaller than the second high similarity of the two second high similarities and the reference video segment 4 can be determined to have a corresponding relation.

It should be noted that, by determining the second high similarity 5 corresponding to the target video segment 2 and the reference video segment 4, and the magnitude relation between the second high similarity 3.33 corresponding to the target video segment 3 and the reference video segment 4, and determining the target video segment 3 corresponding to the second high similarity smaller of the two second high similarities as the corresponding relation between the reference video segment 4 and the target video segment 3, because the second high similarity is smaller, it is indicated that the corresponding relation between the target video segment 3 and other reference video segments (i.e. the reference video segment 1 and the reference video segment 2) is weaker, and is not suitable for determining the corresponding relation with other reference video segments.

Further, deleting all the similarities corresponding to the target video segment 3 and the reference video segment 4 from the second table to obtain a second residual similarity, as shown in the third table:

watch III

Similarity degree	Reference video clip 1	Reference video clip 2
			Target video clip 2	2.5	3.33
Target video clip 4	1.25	1.11
			Target video clip 5	5	2

Similarly, the correspondence between the 5 target video clips and the 4 reference video clips is as follows: there is a correspondence between the target video segment 1 and the reference video segment 3, a correspondence between the target video segment 3 and the reference video segment 4, a correspondence between the target video segment 5 and the reference video segment 1, and a correspondence between the target video segment 2 and the reference video segment 2.

The technical scheme provided by the embodiment at least brings the following beneficial effects: when the highest similarity includes the first sub-similarity and the second sub-similarity, and the first sub-similarity and the second sub-similarity are equal, a magnitude relation between the second high similarity corresponding to the first reference video segment and the second high similarity corresponding to the third reference video segment can be further determined, so that when the second high similarity corresponding to the first reference video segment is determined to be smaller than the second high similarity corresponding to the third reference video segment, a correspondence exists between the first target video segment corresponding to the first sub-similarity and the first reference video segment. By means of the implementation mode, in the case that the highest similarity comprises two equal similarities, one target video segment and one reference video segment with corresponding relations can be accurately determined.

In an embodiment, referring to fig. 2, as shown in fig. 7, the method of "clipping p target video clips based on the arrangement sequence of p target video clips to obtain a target video" in S204 may specifically include S2041-S2042.

S2041, when the duration of any one of the p target video clips is longer than the target duration of the reference video clip corresponding to any one of the p target video clips, clipping any one of the target video clips, selecting a partial clip with highest similarity with the corresponding reference video clip from any one of the target video clips, and adjusting the duration of any one of the target video clips to be the target duration.

Optionally, when the duration of a certain target video segment is longer than the target duration of a reference video segment corresponding to the certain target video segment, the duration of the certain target video segment may be clipped first, so as to select a portion segment with the highest similarity with the corresponding reference video segment from the certain target video segment, and obtain a video segment with the duration being the target duration.

Similarly, for any one of the p target video clips, the duration of the target video clip can be clipped to obtain a video clip with the same duration as the reference video clip.

S2042, based on the arrangement sequence of the p target video clips, clipping the p target video clips with the video duration adjusted, and obtaining the target video.

Further, after the time length of each target video segment (or part of target video segments) in the p target video segments is clipped, the target video is obtained by clipping the p target video segments after the time length clipping based on the arrangement order of the p target video segments.

The technical scheme provided by the embodiment at least brings the following beneficial effects: when the duration of any one of the p target video clips is longer than the target duration of the reference video clip corresponding to any one of the target video clips, clipping processing can be performed on any one of the target video clips so as to select a portion clip with highest similarity with the corresponding reference video clip from any one of the target video clips, and the effect of adjusting the duration of any one of the target video clips to be the target duration is achieved, so that the p target video clips with the adjusted video durations are clipped based on the arrangement sequence of the p target video clips, and the target video is obtained. By the implementation mode, the method for adjusting the time length of the target video clips is provided, so that the clipping processing of a plurality of target video clips can be improved, and the effect of the target video is obtained.

In an embodiment, as shown in fig. 8 in connection with fig. 7, the method in S2042 may specifically include S601-S602.

S601, sorting the p target video clips with the video time length adjusted based on the arrangement sequence of the p target video clips, and inserting a transition special effect between every two adjacent target video clips in the p sorted target video clips.

S602, video synthesis processing is carried out on the p sorted target video clips and the transition special effects, and target videos are obtained.

Optionally, in the process of clipping p target video clips with the video duration adjusted, a transition special effect can be inserted between any two adjacent target video clips, so that the p target video clips and the transition special effect are subjected to video synthesis processing to obtain a target video including the transition special effect, and the video effect of the obtained synthesized video can be improved.

The technical scheme provided by the embodiment at least brings the following beneficial effects: the transition special effect can be inserted between every two adjacent target video clips in the p sequenced target video clips, so that video synthesis processing is carried out on the p sequenced target video clips and the transition special effect to obtain a target video. By the implementation mode, the effect of editing a plurality of target video clips can be further improved by inserting a transition special effect between every two adjacent target video clips, and the effect of the target video is obtained.

In an embodiment, as shown in fig. 9 in conjunction with fig. 2, the method in S201 may specifically include S701-S702.

S701, acquiring a reference video from a template library according to the selection operation of the user object.

The template library comprises a plurality of reference videos, wherein the plurality of reference videos are videos formed by a plurality of video clips, and the plurality of reference videos comprise videos with user objects added into the template library.

Optionally, the user object may select a composite video with better video quality from the network in advance, and add the composite video to the template library, so that the composite video can be used as a reference video when the user object clips a video clip later. The template library may be a template library corresponding to user objects, only the user objects being visible.

Alternatively, the template library may be a template library in a network, in which case all user objects may acquire a composite video from the template library as a reference video.

S702, acquiring m target video clips from the electronic equipment according to input operation of a user object.

The m target video clips are video clips shot by the electronic equipment.

Further, after determining the reference video, the user object may select a plurality of video clips as m target video clips to be synthesized, so that the electronic device may clip the m target video clips according to the reference video to obtain the target video.

The technical scheme provided by the embodiment at least brings the following beneficial effects: the user object can add a plurality of reference videos formed by a plurality of video clips into the template library in advance, so that when the m target video clips are required to be clipped, the electronic equipment is triggered to directly acquire the reference videos from the template library comprising the plurality of reference videos; and after m target video clips shot by the electronic equipment are acquired from the electronic equipment according to the input operation of the user object, clipping the m target video clips. Through the implementation mode, the template library can be established in advance, so that when the editing processing is required to be carried out on m target video clips, reference videos can be directly obtained from the template library, and the efficiency of the electronic equipment for editing the video clips is improved, and the synthesized video is obtained.

As can be seen from the above examples, the present disclosure is used in a video editing application to obtain a composite video by editing a plurality of videos to be synthesized based on a reference video. Therefore, the user object can upload the video material (video clips) shot by the user object, and select a reference video to be imitated at the same time, so as to trigger the electronic equipment to select proper video clips from a plurality of video materials uploaded by the user object, and clip and synthesize the video clips according to the arrangement sequence corresponding to the video clips included in the reference video, thereby obtaining the synthetic video with higher quality.

It will be appreciated that the above method may be implemented by a video editing apparatus. The video editing device comprises a hardware structure and/or a software module corresponding to each function for realizing the functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present disclosure.

The embodiments of the present disclosure may divide the functional modules of the video editing apparatus and the like according to the above method examples, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present disclosure, the division of the modules is merely a logic function division, and other division manners may be implemented in actual practice.

Fig. 10 is a schematic diagram illustrating a structure of a video clip apparatus according to an exemplary embodiment. Referring to fig. 10, the video clip apparatus 100 may include: an acquisition unit 1001, a determination unit 1002, and a processing unit 1003.

An obtaining unit 1001 configured to perform obtaining a reference video and m target video clips, and extract n reference video clips from the reference video; m and n are positive integers; for example, the acquisition unit 1001 may be used to perform the steps in step 201 in fig. 2.

A determining unit 1002 configured to perform determining a corresponding similarity between each of the m target video clips and each of the n reference video clips, respectively; for example, the determining unit 1002 may be configured to perform the steps of step 202 in fig. 2.

A determining unit 1002 further configured to perform determining p target video clips and p reference video clips having a one-to-one correspondence from the m target video clips and the n reference video clips according to the similarity; one of the p target video clips corresponds to one of the p reference video clips, and p is the minimum value of m and n; for example, the determining unit 1002 may be adapted to perform the steps of step 203 in fig. 2.

A determining unit 1002 further configured to perform determining an arrangement order of p target video clips corresponding to the p reference video clips according to an arrangement order of the p reference video clips in the reference video; for example, the determining unit 1002 may be configured to perform the steps of step 204 in fig. 2.

The processing unit 1003 is configured to perform clipping processing on the p target video clips based on the arrangement order of the p target video clips, resulting in a target video. For example, the processing unit 1003 may be configured to perform the steps of step 204 in fig. 2.

Optionally, the processing unit 1003 is configured to perform adding n marks in the reference video according to the video content included in the n reference video clips; each mark is used for marking the starting position of a reference video segment in the reference video; for example, the processing unit 1003 may be used to perform the steps of step 301 in fig. 3.

The obtaining unit 1001 is configured to perform extraction of n reference video clips from the reference video according to n marks. For example, the acquisition unit 1001 may be used to perform the steps in step 2011 in fig. 3.

Optionally, the obtaining unit 1001 is configured to perform, for each of the m target video segments and the n reference video segments, extracting a multi-frame target image from each video segment, so as to obtain an image set corresponding to each video segment; for example, the acquisition unit 1001 may be used to perform the steps in step 401 in fig. 4.

A processing unit 1003 configured to perform clustering processing on all target images included in a plurality of image sets corresponding to the m target video clips and the n reference video clips, to obtain a plurality of image categories; for example, the processing unit 1003 may be used to perform the steps in step 402 in fig. 4.

A determining unit 1002 configured to perform determining a ratio of the number of images corresponding to each image category to the total number of images corresponding to the image sets, which are included in each image set; for example, the determining unit 1002 may be configured to perform the steps in step 403 in fig. 4.

The determining unit 1002 is configured to determine a feature vector corresponding to each image set according to a plurality of ratios corresponding to a plurality of image categories included in each image set, and determine a corresponding similarity between each of the m target video clips and each of the n reference video clips according to the feature vector corresponding to each image set. For example, the determining unit 1002 may be configured to perform the steps of step 404 in fig. 4.

Optionally, the determining unit 1002 is configured to determine a first highest similarity from the similarities, and determine that a correspondence exists between a first target video segment corresponding to the first highest similarity and a first reference video segment; the first target video segments and the first reference video segments are video segments in m target video segments and n reference video segments respectively; for example, the determining unit 1002 may be adapted to perform the steps of step 501 in fig. 5.

A processing unit 1003 configured to delete all the similarities corresponding to the first target video segment and the first reference video segment from the similarities, to obtain a first remaining similarity; for example, the processing unit 1003 may be used to perform the steps in step 502 in fig. 5.

A processing unit 1003 configured to determine a second highest similarity from the first remaining similarities, and determine that a correspondence exists between a second target video segment corresponding to the second highest similarity and a second reference video segment; deleting all the similarities corresponding to the second target video segment and the second reference video segment from the first residual similarity to obtain a second residual similarity; and circularly executing the step until p target video clips and p reference video clips with one-to-one correspondence are determined. For example, the processing unit 1003 may be used to perform the steps of step 503 in fig. 5.

Optionally, the highest similarity includes a first sub-similarity and a second sub-similarity, where the first sub-similarity is equal to the second sub-similarity, the first sub-similarity is a similarity corresponding to the first target video segment and the first reference video segment, and the second sub-similarity is a similarity corresponding to the first target video segment and the third reference video segment; the determining unit 1002 is configured to determine that a correspondence exists between the first target video segment corresponding to the first sub-similarity and the first reference video segment, if it is determined that the second high similarity corresponding to the first reference video segment is smaller than the second high similarity corresponding to the third reference video segment. For example, the determination unit 1002 may be configured to perform the steps in step 5011 in fig. 6.

Optionally, the processing unit 1003 is configured to execute a duration of any one of the p target video clips, which is greater than a target duration of a reference video clip corresponding to any one of the p target video clips, clip any one of the target video clips, select a portion clip with the highest similarity with the corresponding reference video clip from any one of the target video clips, and adjust the duration of any one of the target video clips to be the target duration; for example, the processing unit 1003 may be configured to perform the steps in step 2041 in fig. 7.

And a processing unit 1003 configured to execute clipping processing on the p target video clips with the adjusted video duration based on the arrangement order of the p target video clips, to obtain a target video. For example, the processing unit 1003 may be configured to perform the steps of step 2042 in fig. 7.

Optionally, the processing unit 1003 is configured to perform sorting of the p target video clips with the adjusted video duration based on the arrangement sequence of the p target video clips, and insert a transition special effect between every two adjacent target video clips in the p sorted target video clips; for example, the processing unit 1003 may be used to perform the steps in step 601 in fig. 8.

The processing unit 1003 is configured to perform video synthesis processing on the p sorted target video clips and the transition special effects, so as to obtain a target video. For example, the processing unit 1003 may be used to perform the steps of step 602 in fig. 8.

Optionally, the obtaining unit 1001 is configured to obtain the reference video from the template library according to the selection operation of the user object; the template library comprises a plurality of reference videos, wherein the plurality of reference videos are videos formed by a plurality of video clips, and the plurality of reference videos comprise videos with user objects added into the template library; for example, the acquisition unit 1001 may be used to perform the steps in step 701 in fig. 9.

An acquisition unit 1001 configured to perform an input operation according to a user object, acquiring m target video clips from an electronic device; the m target video clips are video clips shot by the electronic equipment. For example, the acquisition unit 1001 may be used to perform the steps in step 702 in fig. 9.

As above, the embodiments of the present disclosure may divide functional modules of an electronic device according to the above-described method examples. The integrated modules may be implemented in hardware or in software functional modules. In addition, it should be further noted that the division of the modules in the embodiments of the present disclosure is merely a logic function division, and other division manners may be implemented in practice. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated in one processing module.

With respect to the video clip apparatus in the above-described embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the method, and will not be described in detail herein.

Fig. 11 is a schematic structural view of a video editing apparatus 60 provided in the present disclosure. As shown in fig. 11, the video clip apparatus 60 may include at least one processor 601 and a memory 603 for storing instructions executable by the processor 601. Wherein the processor 601 is configured to execute instructions in the memory 603 to implement the video clip method in the above-described embodiments.

In addition, video clip device 60 may also include a communication bus 602 and at least one communication interface 604.

Processor 601 may be a GPU, a micro-processing unit, an ASIC, or one or more integrated circuits for controlling program execution in the presently disclosed aspects. The communication bus 602 may include a pathway to transfer information between the aforementioned components.

The communication interface 604 uses any transceiver-like device for communicating with other devices or communication networks, such as ethernet, radio access network (radio access network, RAN), wireless local area network (wireless local area networks, WLAN), etc.

The memory 603 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be stand alone and be connected to the processing unit by a bus. The memory may also be integrated with the processing unit as a volatile storage medium in the GPU.

The memory 603 is used for storing instructions for executing the disclosed aspects, and is controlled by the processor 601 for execution. The processor 601 is operative to execute instructions stored in the memory 603 to implement the functions in the methods of the present disclosure.

In a particular implementation, as one embodiment, processor 601 may include one or more GPUs, such as GPU0 and GPU1 in fig. 11.

In a specific implementation, as an embodiment, video clip device 60 may include multiple processors, such as processor 601 and processor 607 in FIG. 11. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-GPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In a specific implementation, as an embodiment, the video clip apparatus 60 may further include an output device 605 and an input device 606. The output device 605 communicates with the processor 601 and may display information in a variety of ways. For example, the output device 605 may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a Cathode Ray Tube (CRT) display device, or a projector (projector), or the like. The input device 606 is in communication with the processor 601 and may accept user input in a variety of ways. For example, the input device 606 may be a mouse, a keyboard, a touch screen device, a sensing device, or the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 11 is not limiting of video clip device 60 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

The present disclosure also provides a computer-readable storage medium having instructions stored thereon that, when executed by a processor of an electronic device, enable the electronic device to perform the group communication method provided by the embodiments of the present disclosure described above.

The disclosed embodiments also provide a computer program product comprising instructions that, when executed by a processor of an electronic device, implement the video editing method provided by the disclosed embodiments.

The disclosed embodiment also provides a communication system, as shown in fig. 1, which includes a server 11 and a client 12. The server 11 and the client 12 are respectively configured to execute the corresponding steps in the foregoing embodiments of the present disclosure, so that the communication system solves the technical problems solved by the embodiments of the present disclosure, and achieves the technical effects achieved by the embodiments of the present disclosure, which are not described herein again.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of video editing, the method comprising:

acquiring a reference video and m target video clips, and extracting n reference video clips from the reference video; m and n are positive integers;

extracting multi-frame target images from each video segment aiming at each video segment in the m target video segments and the n reference video segments to obtain an image set corresponding to each video segment;

clustering all target images included in a plurality of image sets corresponding to the m target video clips and the n reference video clips to obtain a plurality of image categories;

determining the ratio of the number of images corresponding to each image category to the total number of images corresponding to each image set, wherein the ratio is included in each image set;

determining a feature vector corresponding to each image set according to a plurality of ratios corresponding to the plurality of image categories included in each image set, and determining a similarity corresponding to each target video segment of the m target video segments and each reference video segment of the n reference video segments according to the feature vector corresponding to each image set;

According to the similarity, determining p target video clips and p reference video clips with one-to-one correspondence from the m target video clips and the n reference video clips; one target video segment of the p target video segments corresponds to one reference video segment of the p reference video segments, and p is the minimum value of m and n;

according to the arrangement sequence of the p reference video clips in the reference video, determining the arrangement sequence of the p target video clips corresponding to the p reference video clips, and clipping the p target video clips based on the arrangement sequence of the p target video clips to obtain a target video.

2. The method of claim 1, wherein prior to extracting n reference video segments from the reference video, the method further comprises:

adding n marks in the reference video according to video contents included in the n reference video fragments; each of the marks is used for marking the starting position of a reference video segment in the reference video;

the extracting n reference video clips from the reference video includes:

And extracting the n reference video fragments from the reference video according to the n marks.

3. The method according to claim 1, wherein the determining p target video segments and p reference video segments having a one-to-one correspondence from the m target video segments and the n reference video segments according to the similarity includes:

determining a first highest similarity from the similarities, and determining that a corresponding relationship exists between a first target video segment corresponding to the first highest similarity and a first reference video segment; the first target video segment and the first reference video segment are video segments in the m target video segments and the n reference video segments respectively;

deleting all the similarities corresponding to the first target video segment and the first reference video segment from the similarities to obtain a first residual similarity;

determining a second highest similarity from the first residual similarity, and determining that a corresponding relationship exists between a second target video segment corresponding to the second highest similarity and a second reference video segment; deleting all the similarities corresponding to the second target video segment and the second reference video segment from the first residual similarity to obtain a second residual similarity; and circularly executing the step until p target video clips and p reference video clips with one-to-one correspondence are determined.

4. The method of claim 3, wherein the highest degree of similarity comprises a first sub-degree of similarity and a second sub-degree of similarity, the first sub-degree of similarity being the degree of similarity corresponding to the first target video segment and the first reference video segment, the first sub-degree of similarity being the degree of similarity corresponding to the first target video segment and the third reference video segment, the first sub-degree of similarity being equal to the second sub-degree of similarity;

the determining that a corresponding relationship exists between the first target video segment corresponding to the first highest similarity and the first reference video segment includes:

and under the condition that the second high similarity corresponding to the first reference video segment is smaller than the second high similarity corresponding to the third reference video segment, determining that a corresponding relationship exists between the first target video segment corresponding to the first sub-similarity and the first reference video segment.

5. The method according to claim 1, wherein the clipping the p target video clips based on the arrangement order of the p target video clips to obtain a target video includes:

the method comprises the steps that when the duration of any one of the p target video clips is longer than the target duration of a reference video clip corresponding to the any one target video clip, clipping is conducted on the any one target video clip, a part clip with the highest similarity with the corresponding reference video clip is selected from the any one target video clip, and the duration of the any one target video clip is adjusted to be the target duration;

And clipping the p target video clips with the video duration adjusted based on the arrangement sequence of the p target video clips to obtain the target video.

6. The method according to claim 5, wherein the clipping the p target video clips with the adjusted video duration based on the arrangement order of the p target video clips to obtain the target video includes:

sorting the p target video clips with the video duration adjusted based on the arrangement sequence of the p target video clips, and inserting a transition special effect between every two adjacent target video clips in the sorted p target video clips;

and carrying out video synthesis processing on the p sequenced target video clips and the transition special effect to obtain the target video.

7. The method of claim 1, wherein the obtaining the reference video and the m target video segments, and extracting n reference video segments from the reference video, comprises:

acquiring the reference video from a template library according to the selection operation of the user object; the template library comprises a plurality of reference videos, wherein the plurality of reference videos are videos formed by a plurality of video clips, and the plurality of reference videos comprise videos with user objects added into the template library;

Acquiring the m target video clips from the electronic equipment according to the input operation of the user object; the m target video clips are video clips shot by the electronic equipment.

8. A video editing apparatus, comprising:

an acquisition unit configured to perform acquisition of a reference video and m target video clips, and extract n reference video clips from the reference video; m and n are positive integers;

the acquisition unit is configured to execute the steps of extracting multi-frame target images from each video segment aiming at each video segment in the m target video segments and the n reference video segments, and obtaining an image set corresponding to each video segment;

the processing unit is configured to perform clustering processing on all target images included in a plurality of image sets corresponding to the m target video clips and the n reference video clips, so as to obtain a plurality of image categories;

the determining unit is configured to determine a ratio of the number of images corresponding to each image category to the total number of images corresponding to the image set, wherein the ratio is included in each image set;

the determining unit is configured to determine a feature vector corresponding to each image set according to a plurality of ratios corresponding to the plurality of image categories included in each image set, and determine a similarity corresponding to each of the m target video clips and each of the n reference video clips according to the feature vector corresponding to each image set;

The determining unit is further configured to determine p target video clips and p reference video clips with a one-to-one correspondence from the m target video clips and the n reference video clips according to the similarity; one target video segment of the p target video segments corresponds to one reference video segment of the p reference video segments, and p is the minimum value of m and n;

the determining unit is further configured to determine an arrangement order of the p target video clips corresponding to the p reference video clips according to an arrangement order of the p reference video clips in the reference video;

and the processing unit is configured to execute clipping processing on the p target video clips based on the arrangement sequence of the p target video clips to obtain target videos.

9. The video editing device according to claim 8, wherein the processing unit is configured to execute adding n marks in the reference video according to video content included in the n reference video clips; each of the marks is used for marking the starting position of a reference video segment in the reference video;

The acquisition unit is configured to perform extraction of the n reference video segments from the reference video according to the n marks.

10. The video clip apparatus according to claim 8, wherein the determining unit is configured to perform determining a first highest similarity from the similarities, and determine that there is a correspondence between a first target video segment and a first reference video segment to which the first highest similarity corresponds; the first target video segment and the first reference video segment are video segments in the m target video segments and the n reference video segments respectively;

the processing unit is configured to delete all the similarities corresponding to the first target video segment and the first reference video segment from the similarities to obtain a first residual similarity;

the processing unit is configured to determine a second highest similarity from the first remaining similarities, and determine that a corresponding relationship exists between a second target video segment corresponding to the second highest similarity and a second reference video segment; deleting all the similarities corresponding to the second target video segment and the second reference video segment from the first residual similarity to obtain a second residual similarity; and circularly executing the step until p target video clips and p reference video clips with one-to-one correspondence are determined.

11. The video editing device according to claim 10, wherein the highest similarity includes a first sub-similarity and a second sub-similarity, the first sub-similarity and the second sub-similarity being equal, the first sub-similarity being a similarity corresponding to the first target video segment and the first reference video segment, the second sub-similarity being a similarity corresponding to the first target video segment and the third reference video segment;

the determining unit is configured to determine that a correspondence exists between the first target video segment corresponding to the first sub-similarity and the first reference video segment, when it is determined that the second high similarity corresponding to the first reference video segment is smaller than the second high similarity corresponding to the third reference video segment.

12. The video clipping device according to claim 8, wherein the processing unit is configured to execute clipping processing on any one of the p target video clips when a duration of the target video clip is longer than a target duration of a reference video clip corresponding to the any one target video clip, select a portion clip with highest similarity to the corresponding reference video clip from the any one target video clip, and adjust the duration of the any one target video clip to be the target duration;

The processing unit is configured to execute clipping processing on the p target video clips with the video duration adjusted based on the arrangement sequence of the p target video clips, so as to obtain the target video.

13. The video editing device according to claim 12, wherein the processing unit is configured to perform sorting of the p target video clips with the video duration adjusted based on the arrangement order of the p target video clips, and insert a transition special effect between each adjacent two of the p target video clips after sorting;

the processing unit is configured to perform video synthesis processing on the p sequenced target video clips and the transition special effect to obtain the target video.

14. The video editing apparatus according to claim 8, wherein the acquisition unit is configured to perform acquisition of the reference video from a template library according to a selection operation of a user object; the template library comprises a plurality of reference videos, wherein the plurality of reference videos are videos formed by a plurality of video clips, and the plurality of reference videos comprise videos with user objects added into the template library;

The acquisition unit is configured to perform input operation according to a user object, and acquire the m target video clips from the electronic equipment; the m target video clips are video clips shot by the electronic equipment.

15. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video clip method of any of claims 1-7.

16. A computer readable storage medium having instructions stored thereon, which when executed by a processor of an electronic device, cause the electronic device to perform the video editing method of any of claims 1-7.