CN112367551B

CN112367551B - Video editing method and device, electronic equipment and readable storage medium

Info

Publication number: CN112367551B
Application number: CN202011198204.2A
Authority: CN
Inventors: 芮元乐
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2023-06-16
Anticipated expiration: 2040-10-30
Also published as: CN112367551A

Abstract

The application discloses a video editing method and device, electronic equipment and a readable storage medium, and belongs to the field of video processing. The video editing method comprises the following steps: displaying N video editing tracks of the target video; the N video editing tracks comprise M video frame tracks, and the same video frame track is associated with object images of the same object in different video frames in the target video; receiving a first input of a user to a target track of the N video editing tracks; processing the object image associated with the target track in response to the first input to generate a target file; wherein N, M is a positive integer, M is less than or equal to N. In the video editing process, the method and the device aim at the object in the video to edit, so that granularity of video editing is finer and more flexible.

Description

Video editing method and device, electronic equipment and readable storage medium

Technical Field

The application belongs to the technical field of video processing, and particularly relates to a video editing method and device, electronic equipment and a readable storage medium.

Background

Video editing techniques generally refer to techniques that perform editing operations such as cutting, piecing together, adding text, adding pictures, adding sound effects, and the like on video files. Currently, video editing techniques have been widely used in various aspects of life. For example, some fun and laughter video clips are produced through a video editing technology and released on various social websites for people to enjoy daily recreation.

Currently, a video produced by a video editing technology is usually a video clip in an original video file, that is, a video clip obtained by cutting a continuous video frame in the original video file.

However, there may be multiple objects in each video frame in the video file, and with the prior art, editing operation cannot be performed on one of the objects, so that granularity of video editing is not fine and flexible enough.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and apparatus for video editing, an electronic device, and a readable storage medium, which can solve the problem in the prior art that granularity of video editing is not fine enough and flexible.

In order to solve the technical problems, the application is realized as follows:

in a first aspect, an embodiment of the present application provides a method for video editing, where the method includes:

displaying N video editing tracks of the target video; the N video editing tracks comprise M video frame tracks, and the same video frame track is associated with object images of the same object in different video frames in the target video;

receiving a first input of a user to a target track of the N video editing tracks;

Processing the object image associated with the target track in response to the first input to generate a target file;

wherein N, M is a positive integer, M is less than or equal to N.

Optionally, displaying M video frame tracks of the N video editing tracks of the target video includes:

respectively carrying out image segmentation on each video frame in the target video to obtain an object image corresponding to each object in the target video;

writing all object images corresponding to the same object into the same preset blank track according to the frame numbers of the object images to obtain M video frame tracks; the frame serial number of the object image is the position information of the video frame to which the object image belongs in the target video;

and displaying the M video frame tracks.

Optionally, the image segmentation is performed on each video frame in the target video to obtain an object image corresponding to each object in the target video, which includes:

dividing each video frame in the target video into at least one object image according to image content, and correspondingly recording the frame sequence number of each object image;

adding a label to each object image according to the image content of each object image;

And determining the object images with the same label as the object images corresponding to the same object in the target video.

Optionally, writing all object images corresponding to the same object into the same preset blank track according to the frame sequence number of the object image to obtain M video frame tracks, including:

displaying a target interface comprising T object controls; wherein each object control indicates an object in the target video;

receiving second input of a user to M object controls in the T object controls;

responding to the second input, respectively writing all object images corresponding to the M objects indicated by the M object controls into M preset blank tracks according to the frame numbers of the object images to obtain M video frame tracks;

wherein T is a positive integer, and T is more than or equal to M.

Optionally, in a case where M is greater than 1, the displaying the M video frame tracks includes:

and sequentially displaying the M video frame tracks in the target interface according to the sequence from the large number to the small number of the object images in the M video frame tracks.

Scoring each object image based on aesthetic features of the object image to obtain a score of each object image;

calculating the average value of the scores of all object images corresponding to each object respectively to obtain the score of each object;

and writing all object images corresponding to the same target object into the same preset blank track according to the frame sequence numbers of the object images to obtain M video frame tracks, wherein the target object is an object with the score larger than a preset threshold value.

Optionally, the receiving a first input from the user to a target track of the N video editing tracks includes:

receiving a first input of a user on K object images on the target track; wherein K is an integer greater than 1;

the processing the object image associated with the target track to generate a target file includes:

and performing image synthesis on all object images indicated by the first input to generate a static image, a dynamic image or a video.

Optionally, the N video editing tracks further include L voice tracks; wherein L is a positive integer, L < N;

wherein the same voice track is associated with voice data of the same sound source object in the target video.

Optionally, in the case that the target track includes one video frame track and one voice track, the receiving the first input of the target track from the N video editing tracks includes:

receiving a first input of a user to an object image on a video frame track in the target track and a voice track in the target track;

converting voice information, which is located in the same time window as the object image indicated by the first input, in the voice data corresponding to the voice track indicated by the first input into text information;

and synthesizing the text information onto the object image indicated by the first input to generate a static image or a dynamic image.

In a second aspect, an embodiment of the present application provides an apparatus for video editing, including:

the display module is used for displaying N video editing tracks of the target video; the N video editing tracks comprise M video frame tracks, and the same video frame track is associated with object images of the same object in different video frames in the target video;

The first receiving module is used for receiving a first input of a user to a target track in the N video editing tracks;

the first response module is used for responding to the first input, processing the object image associated with the target track and generating a target file;

wherein N, M is a positive integer, M is less than or equal to N.

Optionally, the display module includes:

the image segmentation unit is used for respectively carrying out image segmentation on each video frame in the target video to obtain an object image corresponding to each object in the target video;

the video frame track unit is used for writing all object images corresponding to the same object into the same preset blank track according to the frame sequence numbers of the object images to obtain M video frame tracks; the frame serial number of the object image is the position information of the video frame to which the object image belongs in the target video;

and the display unit is used for displaying the M video frame tracks.

Optionally, the image segmentation unit is specifically configured to segment each video frame in the target video into at least one object image according to image content, and record a frame sequence number of each object image correspondingly; adding a label to each object image according to the image content of each object image; and determining the object images with the same label as the object images corresponding to the same object in the target video.

Optionally, the video frame track unit is specifically configured to display a target interface including T object controls; wherein each object control indicates an object in the target video; receiving second input of a user to M object controls in the T object controls; responding to the second input, respectively writing all object images corresponding to the M objects indicated by the M object controls into M preset blank tracks according to the frame numbers of the object images to obtain M video frame tracks; wherein T is a positive integer, and T is more than or equal to M.

Optionally, in the case where M is greater than 1, the display unit is specifically configured to sequentially display the M video frame tracks in the target interface in order of increasing number of object images in the M video frame tracks.

Optionally, the video frame track unit is specifically configured to score each object image based on aesthetic features of the object image, so as to obtain a score of each object image; calculating the average value of the scores of all object images corresponding to each object respectively to obtain the score of each object; and writing all object images corresponding to the same target object into the same preset blank track according to the frame sequence numbers of the object images to obtain M video frame tracks, wherein the target object is an object with the score larger than a preset threshold value.

Optionally, the first receiving module is specifically configured to receive a first input of a user to K object images on the target track; wherein K is an integer greater than 1;

the first response module is specifically configured to perform image synthesis on all object images indicated by the first input, and generate a still image, a moving image or a video.

Optionally, in the case that the target track includes one video frame track and one voice track, the first receiving module is specifically configured to receive a first input of a user to an object image on the video frame track in the target track and to the voice track in the target track;

the first response module is specifically configured to convert, in the voice data corresponding to the voice track of the first input instruction, voice information that is located in the same time window as the object image of the first input instruction into text information; and synthesizing the text information onto the object image indicated by the first input to generate a static image or a dynamic image.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, the program or instruction implementing the steps of the method according to the first aspect when executed by the processor.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In the embodiment of the application, the object images corresponding to different objects in the target video are displayed through different video editing tracks, so that a user can conveniently operate the object images of each object. By inputting the target track in the video editing track, the object image associated with the target track can be processed to obtain the target file, so that the video editing operation of the target video is completed. The video editing process is simplified; in addition, each object in the video is edited in a refined manner in the video editing process, so that granularity of video editing is finer and more flexible.

Drawings

FIG. 1 is a flow chart of steps of a method of video editing provided by an embodiment of the present application;

FIG. 2 is a schematic illustration of a video frame track provided in an embodiment of the present application;

FIG. 3 is one of the target interface presentation schematics provided in the embodiments of the present application;

fig. 4 is a schematic diagram of a deletion object image provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a file format of a target file according to an embodiment of the present disclosure;

FIG. 6 is a schematic illustration of a video editing track provided by an embodiment of the present application;

FIG. 7 is a second exemplary illustration of a target interface provided in an embodiment of the present application;

FIG. 8 is a block diagram of an apparatus for video editing provided by an embodiment of the present application;

fig. 9 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application;

fig. 10 is a second schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The method for editing video provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

As shown in fig. 1, a flowchart of steps of a method for video editing according to an embodiment of the present application is provided, where the method for video editing includes:

step 101: n video editing tracks of the target video are displayed.

In this step, the N video editing tracks include M video frame tracks, where the same video frame track is associated with object images of the same object in different video frames, and N, M is a positive integer, where M is less than or equal to N. Here, the video editing track may be a track style control, taking the video editing track as a video frame track as an example. The video editing track may be understood as a track formed by arranging and displaying object images corresponding to the same object on a straight line. The video editing track may present an object image and may also receive user input.

An object in a target video may be understood as a person, animal or article in a play screen when the target video is played. Wherein the same person, animal or article in different playing pictures belongs to the same object. Here, the video editing tracks displayed may include video frame tracks corresponding to all or a part of the objects, respectively, that is, M is less than or equal to the number of objects in the target video, but is not limited thereto. Wherein the video frame track corresponding to each object includes all object images of each object in different video frames.

Of course, other types of data, such as but not limited to voice data, may be included in the target video in addition to the video frames. Thus, when generating a video editing track, the video frame track may be generated for video frames only; tracks containing other data content may also be generated from other types of data at the same time as the video frame tracks are generated. The target video may be one or more video files. That is, video editing may be performed simultaneously for at least one video file.

Preferably, the display of the video editing track may be triggered by a user. For example, after a user selects a target video, displaying a multi-track editing control; after receiving user input to the multi-track editing control, step 101 is performed. Here, the input to the multi-track editing control may be a click input, a slide input, a long press input, or the like.

Step 102: a first input by a user to a target track of the N video editing tracks is received.

In this step, the user inputs which video editing track is the target track, and which video editing track is the target track. That is, the target track includes some or all of the N video editing tracks. Preferably, the target track comprises at least one video frame track. The first input may be a single input such as a click input, a slide input, a long press input, or a plurality of continuous inputs such as a plurality of continuous click inputs.

Step 103: in response to the first input, processing the object image associated with the target track to generate a target file.

In this step, the generated target file includes the target image associated with the target track. For each target track, the target file may include some or all of the data content associated with the target track.

and respectively carrying out image segmentation on each video frame in the target video to obtain an object image corresponding to each object in the target video.

In this step, the image segmentation includes: panorama segmentation, instance segmentation, or semantic segmentation. Preferably, a panoramic segmentation mode is adopted to carry out image segmentation on video frames in the target video. Preferably, when the video frames are subjected to image segmentation to obtain object images, the frame position, namely the frame sequence number, of the video frame to which each object image belongs in the target video can be recorded. Specifically, image segmentation is performed on each video frame in the target video to obtain an object image corresponding to each object in the target video, including: dividing each video frame in the target video into at least one object image according to the image content, and correspondingly recording the frame serial number of each object image; adding a label to each object image according to the image content of each object image; and determining the object images with the same label as the object images corresponding to the same object in the target video. That is, panoramic segmentation is performed on each video frame in the target video, thereby obtaining a large number of object images, and the frame number of each object image is recorded; and then clustering and labeling the object images to determine the object images corresponding to the same object. The tag here includes an object identifiable in a video of a person, an animal, or the like.

And writing all object images corresponding to the same object into the same preset blank track according to the frame numbers of the object images to obtain M video frame tracks.

In this step, the frame number of the object image is the position information of the video frame to which the object image belongs in the target video. The number of objects in the target video may be substantial and respective corresponding video frame tracks may be generated for some or all of them. Referring to fig. 2, there are three different characters, i.e., three different objects, in the target video 201; a respective corresponding video frame track 202 may be generated for three different objects. Preferably, the M video frame tracks have the same time axis, and each object image is arranged on the video frame track according to a respective corresponding frame number.

M video frame tracks are displayed.

In the embodiment of the application, the object images in the video frames of the target video are segmented in an image segmentation mode, so that the image contents corresponding to different objects in the video file are separated. And writing all object images corresponding to the same object into the same preset blank track to generate a video frame track, so that a user can conveniently operate aiming at the image content of the same object.

displaying a target interface comprising T object controls; wherein each object control indicates an object in the target video.

In this step, T is a positive integer, and preferably T is equal to the number of objects in the target video. The object control may display an object image corresponding to the object for which it indicates. As shown in fig. 3, in the case that the number of objects in the target video is 3, the T object controls include a first object control 301, a second object control 302, and a third object control 303.

A second input by the user to M of the T object controls is received.

In this step, M is the same as M in step 101, and it is understood that T.gtoreq.M. The second input may be a click input, a slide input, a long press input, or the like.

And responding to the second input, respectively writing all object images corresponding to the M objects indicated by the M object controls into M preset blank tracks according to the frame numbers of the object images to obtain M video frame tracks.

In this step, a video frame track is generated only for the object control input by the user. With continued reference to fig. 3, when the user performs the second input on the first object control 301 and the second object control 302, only the first video frame track needs to be generated according to all the object images corresponding to the object indicated by the first object control 301; and generating a second video frame track according to all object images corresponding to the objects indicated by the second object control 302.

In the embodiment of the application, the object image video frame track containing the object selected by the user can be generated according to the requirement of the user; the method and the device avoid generating video frame tracks for object images corresponding to all objects, so that the number of the video frame tracks is excessive, and editing and experience of users are affected.

Optionally, in the case where M is greater than 1, displaying M video frame tracks includes:

In the embodiment of the application, the video frame tracks are displayed in a sequence according to the number of the object images respectively associated, so that a user can conveniently know the mirror output rate of each object in the target video frame, and meanwhile, the user can conveniently edit the object images corresponding to the objects with high mirror output rate.

and scoring each object image based on the aesthetic features of the object image to obtain the score of each object image.

In this step, the aesthetic features include photographing parameters at the time of photographing the subject image and image parameters of the subject image itself. The image parameters of the subject image itself here include, but are not limited to, sharpness, composition information. The scoring rule may be preset, and the object image may be scored according to the scoring rule after the aesthetic feature of the object image is acquired. The scoring rule may be set according to the user's requirement, for example, but not limited to, a rule regarding the display effect, the image definition, etc. of each element in the object sequence. Such as AI (artificial intelligence ) aesthetic scoring rules.

And respectively calculating the average value of the scores of all object images corresponding to each object to obtain the score of each object.

And writing all object images corresponding to the same target object into the same preset blank track according to the frame numbers of the object images to obtain M video frame tracks.

In this step, the target object is an object whose score is greater than a preset threshold. Here, since the score average value of all the object images corresponding to the target object is greater than the preset threshold, it is indicated that all the object images corresponding to the target object meet aesthetic requirements or image quality requirements.

In the embodiment of the application, each object image is scored through the aesthetic features of the object image; and a score for each object is determined. Then, corresponding video frame tracks are generated only for object images of the target objects with scores exceeding a preset threshold. That is, only the video frame tracks are generated for the object images which meet aesthetic requirements, so that the situation that the number of the video frame tracks is excessive due to the fact that the video frame tracks are generated for the object images corresponding to all the objects is avoided, and therefore editing and experience of users are affected.

Optionally, receiving a first input from a user to a target track of the N video editing tracks includes: a first input from a user to K object images on a target track is received.

The target track is a video frame track, and K is an integer greater than 1. The first input may be a click input, a slide input, a long press input, or the like. Here, the user may make other inputs to the object images displayed on the displayed video frame track before making the first input to the K object images on the target track, thereby editing the object images on the video frame track. For example, multiple object images on the same or different video frame tracks may be selected by a long press input. Any object image on the video frame track can be deleted by sliding input; any two object images on the video frame track can be transposed or one image can be synthesized by drag input. Preferably, when two object images are located on the same video frame track, the positions of the two object images can be interchanged by drag input. When two object images are located in different video frame tracks, the two object images can be combined into one image by a drag input. As shown in fig. 4, three video frame tracks, namely a first video frame track 41, a second video frame track 42 and a third video frame track 43 are shown; when the object image at the first frame position in the first video frame rail 41 is slid upward, the object image is deleted.

The selected object image may also be entered by a single click or double click. The video frame track may also be locked by double clicking on the object image on the video frame track. When a certain video frame track is locked, when other unlocked video frame tracks are edited, if an element at the corresponding position of the locked video frame track exists, the element is displayed on the unlocked video frame track being edited.

Editing the object image on the video frame track through other inputs can be understood as directly editing the target video. Preferably, the edited target video may be directly saved after the editing of the target video.

Processing the object image associated with the target track to generate a target file, including: and performing image synthesis on all object images indicated by the first input to generate a static image, a dynamic image or a video.

Here, a selection control including a plurality of options may be displayed; wherein each option corresponds to a file type, and generating a target file of a corresponding type according to the option selected by the user, but is not limited thereto. As shown in fig. 5, when the user saves the object image after making the first input, the file format of the object file is provided through a dialog box. If the user selects an object image with better display effect in each video frame track, the selected object image can be synthesized into a static image by clicking the photo. If the user selects a plurality of object images in the same video frame track, the selected object images can be synthesized into one dynamic image by clicking the dynamic photo. Preferably, the dynamic image comprises a photo cover and a video clip, and when the photo cover is clicked, the video clip is played. If the user selects a plurality of object images in the same video frame track, the selected object images can be synthesized into a video by clicking the video.

In the embodiment of the application, the user can edit the target editing track which is displayed in a arrayed mode by taking the object as a unit, and automatically select a plurality of object images to be freely combined. Meanwhile, the method can provide users with various file formats, and the object images selected by the users are synthesized into a static image, a dynamic image or a video, so that the requirements of the users on the file types of the target files are met, and meanwhile, the pleasure of the users in the editing process is improved.

In this step, the target video also includes voice data. A voice track is generated from voice data therein while a video frame track is generated. The related description about the video frame track may refer to the foregoing description, and will not be repeated here. With respect to sound source objects, for example, in a dialect video in which only two persons are involved, the sound source objects are two persons in the dialect video.

When generating a voice track, firstly separating voice data from a target video; and dividing the voice data into at least one single sound source data according to the difference of the sound source objects, and generating a corresponding voice track according to each single sound source data. As shown in fig. 6, three video frame tracks 602 and three voice tracks 603 may be generated from a target video 601. Of course, a target interface including a plurality of object controls may also be displayed, and different video frame tracks or voice tracks may be displayed by inputting different object controls. Wherein each object control indicates an object in the target video. The object here may be an object in a video frame or a sound source object in voice data. As shown in fig. 7, the target interface includes a first object control 701, a second object control 702, a third object control 703, a fourth object control 704, a fifth object control 705, and a sixth object control 706; wherein, the objects respectively indicated by the first object control 701, the second object control 702 and the third object control 703 are all objects in the video frame; the objects respectively indicated by the fourth object control 704, the fifth object control 705 and the sixth object control 706 are all sound source objects in the voice data.

Here, in the case where the target track includes at least one voice track and at least one video frame track, when the target file is generated, an image associated with the video frame track and voice data associated with the voice track in the target track may be synthesized as the target file. Or converting the voice data associated with the voice track into text information and then synthesizing the text information with the image associated with the video frame track. For example, when synthesizing a still image from object images on different video frame tracks, if voice data on a voice track is selected at the same time, the selected voice data may be converted into text, and the text may be added to the synthesized still image. Specifically, in the case where the target track includes one video frame track and one voice track, receiving a first input from the user to the target track of the N video editing tracks includes: receiving a first input of a user to an object image on a video frame track in a target track and a voice track in the target track;

processing the object image associated with the target track to generate a target file, including: converting voice information, which is located in the same time window as the object image of the first input instruction, in voice data corresponding to the voice track of the first input instruction into text information; and synthesizing the text information onto the object image indicated by the first input to generate a static image or a dynamic image.

Here, the video frame track and the voice track have the same time axis, and are both time axes of the target video. For example, a plurality of continuous object images on a video frame track appear between 1 minute 12 seconds and 1 minute 50 seconds of the target video, and then a time interval between 1 minute 12 seconds and 1 minute 50 seconds is a time window of the plurality of continuous object images. The voice information in the time window determined in the voice data corresponding to the voice track is the voice information in the voice data between 1 minute 12 seconds and 1 minute 50 seconds. Of course, for an object image on the video frame track, the time window in which the object image is located is the frame number of the video frame to which the object image belongs.

Of course, it is also possible to edit only the voice track and generate a target file containing only voice data associated with the voice track. For example, a single voice track or a plurality of voice tracks are selected, voice synthesis or disassembly is performed, and a synthesized or disassembled voice file is stored. Custom soundtracks may be generated and corresponding lyrics generated in conjunction with text. The voice data associated with the selected voice track can be converted into words and stored as text files. If the speaker is marked on the voice data associated with each voice track, the name of the speaker can be extracted for permutation and combination to generate a corresponding dialogue record.

According to the method and the device for displaying the object images, the object images corresponding to the different objects and the voice data corresponding to the different sound source objects in the target video are displayed through the different video editing tracks, and a user can conveniently operate the object images of each object and the voice data of each sound source object. By inputting the target track in the video editing track, the target image and the voice data associated with the target track can be processed to obtain the target file, so that the video editing operation of the target video is completed. And furthermore, each object and each sound source object in the video are edited in a refined manner in the video editing process, so that the granularity of video editing is finer and more flexible.

It should be noted that, in the video editing method provided in the embodiment of the present application, the execution subject may be a video editing device, or a control module of the video editing device for executing the video editing method. In the embodiment of the present application, a method for executing video editing by a video editing device is taken as an example, and the video editing device provided in the embodiment of the present application is described.

As shown in fig. 8, the embodiment of the present application further provides a video editing apparatus, where the apparatus includes:

A display module 801, configured to display N video editing tracks of a target video; the N video editing tracks comprise M video frame tracks, and the same video frame track is associated with object images of the same object in different video frames in the target video;

a first receiving module 802, configured to receive a first input from a user to a target track of the N video editing tracks;

a first response module 803, configured to process, in response to a first input, an object image associated with a target track, and generate a target file;

wherein N, M is a positive integer, M is less than or equal to N.

Optionally, the display module 801 includes:

the video frame track unit is used for writing all object images corresponding to the same object into the same preset blank track according to the frame sequence numbers of the object images to obtain M video frame tracks; the frame number of the object image is the position information of the video frame to which the object image belongs in the target video;

and the display unit is used for displaying the M video frame tracks.

Optionally, the image segmentation unit is specifically configured to segment each video frame in the target video into at least one object image according to the image content, and record a frame number of each object image correspondingly; adding a label to each object image according to the image content of each object image; and determining the object images with the same label as the object images corresponding to the same object in the target video.

Optionally, the video frame track unit is specifically configured to display a target interface including T object controls; wherein each object control indicates an object in the target video; receiving second input of a user to M object controls in the T object controls; responding to a second input, respectively writing all object images corresponding to M objects indicated by the M object controls into M preset blank tracks according to the frame numbers of the object images to obtain M video frame tracks; wherein T is a positive integer, and T is more than or equal to M.

Optionally, in the case where M is greater than 1, the display unit is specifically configured to sequentially display M video frame tracks in the target interface in order of increasing number of object images in the M video frame tracks.

Optionally, a first receiving module 802 is specifically configured to receive a first input of K object images on the target track from a user; wherein K is an integer greater than 1;

the first response module 803 is specifically configured to perform image synthesis on all object images indicated by the first input, and generate a still image, a moving image or a video.

Optionally, in the case that the target track includes one video frame track and one voice track, the first receiving module 802 is specifically configured to receive a first input of a user on an object image on the video frame track in the target track and the voice track in the target track;

the first response module 803 is specifically configured to convert, from the voice data corresponding to the voice track indicated by the first input, voice information in the same time window as the object image indicated by the first input into text information; and synthesizing the text information onto the object image indicated by the first input to generate a static image or a dynamic image.

The video editing device in the embodiment of the application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a cell phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, wearable device, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc., and the non-mobile electronic device may be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The video editing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

The video editing apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiment of fig. 1, and in order to avoid repetition, a description is omitted here.

Optionally, as shown in fig. 9, the embodiment of the present application further provides an electronic device 900, including a processor 901, a memory 902, and a program or an instruction stored in the memory 902 and capable of being executed on the processor 901, where the program or the instruction implements each process of the method embodiment of video editing described above when executed by the processor 901, and the same technical effects can be achieved, and for avoiding repetition, a description is omitted herein.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

Fig. 10 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: radio frequency unit 1001, network module 1002, audio output unit 1003, input unit 1004, sensor 1005, display unit 1006, user input unit 1007, interface unit 1008, memory 1009, and processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 1010 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 10 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

A display unit 1006 for displaying N video editing tracks of the target video; the N video editing tracks comprise M video frame tracks, and the same video frame track is associated with object images of the same object in different video frames in the target video.

A user input unit 1007 for receiving a first input of a user to a target track of the N video editing tracks.

And a processor 1010 for processing the object image associated with the target track in response to the first input to generate a target file.

Wherein N, M is a positive integer, M is less than or equal to N.

It should be understood that in the embodiment of the present application, the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042, and the graphics processor 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 can include two portions, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein. Memory 1009 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 1010 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the method embodiment of video editing, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running a program or an instruction, implementing each process of the video editing method embodiment, and achieving the same technical effect, so as to avoid repetition, and no further description is provided here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. A method of video editing, the method comprising:

wherein N, M is a positive integer, M is less than or equal to N;

displaying M video frame tracks of the N video editing tracks of the target video, comprising:

writing all object images corresponding to the same target object into the same preset blank track according to the frame sequence numbers of the object images to obtain M video frame tracks, wherein the target object is an object with the score larger than a preset threshold value; the frame serial number of the object image is the position information of the video frame to which the object image belongs in the target video;

And displaying the M video frame tracks.

2. The method according to claim 1, wherein the performing image segmentation on each video frame in the target video to obtain an object image corresponding to each object in the target video includes:

3. The method of claim 1, wherein in the case where M is greater than 1, the displaying the M video frame tracks comprises:

4. The method of claim 1, wherein the receiving a first input by a user to a target track of the N video editing tracks comprises:

5. The method of claim 1, wherein the N video editing tracks further comprise L voice tracks; wherein L is a positive integer, L < N;

6. The method of claim 5, wherein, in the case where the target track comprises one video frame track and one voice track, the receiving a first input by a user to a target track of the N video editing tracks comprises:

7. An apparatus for video editing, the apparatus comprising:

wherein N, M is a positive integer, M is less than or equal to N;

the display module includes:

A display unit for displaying the M video frame tracks;

the device is also for: scoring each object image based on aesthetic features of the object image to obtain a score of each object image; calculating the average value of the scores of all object images corresponding to each object respectively to obtain the score of each object; and writing all object images corresponding to the same target object into the same preset blank track according to the frame sequence numbers of the object images to obtain M video frame tracks, wherein the target object is an object with the score larger than a preset threshold value.

8. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the method of video editing according to any of claims 1-6.

9. A readable storage medium, characterized in that it has stored thereon a program or instructions which, when executed by a processor, implement the steps of the method of video editing according to any of claims 1-6.