CN111526427A

CN111526427A - Video generation method and device and electronic equipment

Info

Publication number: CN111526427A
Application number: CN202010363670.5A
Authority: CN
Inventors: 马兴涛
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-11
Anticipated expiration: 2040-04-30
Also published as: CN111526427B

Abstract

The application provides a video generation method, a video generation device and electronic equipment, and belongs to the technical field of communication. The method comprises the following steps: receiving a first input of a target audio by a user; in response to the first input, displaying N checkpoint template identifications, each checkpoint template identification indicating a checkpoint template segment of a target checkpoint template, the target checkpoint template generated based on the target audio, N being a positive integer; receiving a second input of the target object by the user; in response to the second input, synthesizing the target object with a checkpoint template segment indicated by at least one checkpoint template identifier to generate a target video; wherein the target object includes at least one of an image and a video. According to the scheme, the click template can be customized, the user can make videos through the customized click template, the operation is simple and convenient, the time is saved, and the efficiency of making the click template and making the videos is improved.

Description

Video generation method and device and electronic equipment

Technical Field

The application belongs to the technical field of communication, and particularly relates to a video generation method and device and electronic equipment.

Background

With the development of short video shooting, more and more users enjoy shooting and sharing short videos at present. The video template is deeply loved by people, and when a video is shot by a mobile phone, a good video template can save a lot of post-editing. The special effect card point template is one of the favorite templates of everybody, and in the Application program (APP) of the current common short video shooting, the card point template is arranged, so that a user only needs to upload pictures or videos and can generate a short video by selecting the favorite card point template. However, the self-contained checkpoint templates in the APP all contain preset background music and matched checkpoint templates with preset dynamic effects, the preset checkpoint templates may not be the checkpoint templates required by the user, and the video effect required by the user and the video effect that can be achieved by the preset checkpoint templates may be unmatched.

Disclosure of Invention

The embodiment of the application aims to provide a video generation method, a video generation device and electronic equipment, and the problem that a video effect required by a user is not matched with a video effect which can be achieved by a preset checkpoint template can be solved.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a video generation method, including:

receiving a first input of a target audio by a user;

in response to the first input, displaying N checkpoint template identifications, each checkpoint template identification indicating a checkpoint template segment of a target checkpoint template, the target checkpoint template generated based on the target audio, N being a positive integer;

receiving a second input of the target object by the user;

in response to the second input, synthesizing the target object with a checkpoint template segment indicated by at least one checkpoint template identifier to generate a target video;

wherein the target object includes at least one of an image and a video.

In a second aspect, an embodiment of the present application provides a video generating apparatus, including:

the first receiving module is used for receiving a first input of a user to the target audio;

a first response module to display, in response to the first input, N checkpoint template identifications, each checkpoint template identification indicating a checkpoint template segment of a target checkpoint template, the target checkpoint template generated based on the target audio, N being a positive integer;

the second receiving module is used for receiving a second input of the target object by the user;

a second response module, configured to, in response to the second input, synthesize the target object with a checkpoint template segment indicated by at least one checkpoint template identifier, and generate a target video;

wherein the target object includes at least one of an image and a video.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, N checkpoint template identifications are displayed through first input of a user to a target audio, each checkpoint template identification indicates one checkpoint template segment of a target checkpoint template, the target object is synthesized with the checkpoint template segment indicated by at least one checkpoint template identification through second input of the user to the target object, a target video is generated, the target checkpoint template can be generated according to the target audio required to be selected by the user, and the target video can be generated through the generated target checkpoint template and the target object selected by the user, so that the video effect of the target video required to be obtained by the user is achieved.

Drawings

Fig. 1 is a schematic flow chart of a video generation method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a card point template according to an embodiment of the present disclosure;

FIG. 3 is a second schematic view illustrating the manufacture of a stuck point template according to an embodiment of the present application;

FIG. 4 is a third schematic diagram of an embodiment of a card point template;

FIG. 5 is a fourth schematic view illustrating the manufacture of a stuck point template according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a target video production according to an embodiment of the present application;

FIG. 7 is a second schematic diagram of target video production according to the embodiment of the present application;

FIG. 8 is a third schematic diagram of target video production according to an embodiment of the present application;

FIG. 9 is a fourth schematic diagram of target video production according to an embodiment of the present application;

FIG. 10 is a fifth schematic diagram of target video production according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a video generation apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The video generation method, the video generation device, and the electronic device provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

As shown in fig. 1, an embodiment of the present application provides a video generation method, including:

step 101, receiving a first input of a target audio by a user.

In step 101, the target audio may be local music or network music. The first input is for selecting audio, the selected audio including the target audio, the target audio being audio for custom generating a target checkpoint template. The target checkpoint template refers to a template in the form of Moving Picture Experts Group audio layer IV (MP 4) for performing one or more checkpoints on a target audio according to audio information of the target audio by using an Artificial Intelligence (AI) technique, and dividing the target audio into a plurality of segments to generate the target audio.

It should be noted that the first input is an input such as a click, a press, a slide, and the like of the target audio by the user, and the first input may also be a first operation, which is not specifically limited herein.

Optionally, before step 101, the method further includes:

displaying an audio track of a first audio and a track box located on the audio track;

receiving a third input of the track frame by the user;

in response to the third input, updating the track box to a first position of the audio track;

and determining the audio segment of the region framed by the track box at the first position as the target audio.

Specifically, as shown in fig. 2, before the step of displaying the audio track of the first audio and the track frame located on the audio track, the user may perform a checkpoint template making interface by clicking and the like, where the checkpoint template making interface includes a button for adding music 21, and is used to enter a music selection interface by clicking the button for adding music 21, and display at least one piece of music on the music selection interface, and the user may select the first audio as needed, and select any audio segment of the first audio as a target audio as needed, so that the user can make a desired target checkpoint template. The audio track refers to a "track" displayed after audio of background music is stripped, and the track has audio attributes, such as timbre, volume and the like of a music track. The track frame is a selection frame for framing the audio track.

For example: as shown in fig. 3, a music selection interface includes music a, music B, and music C, the playing time length of each music can be displayed at the back of the music, and when the user clicks the circular selection key in the front of music a, the circular selection key in the front of music a takes the √ shape, and then music a is selected as the first audio.

Specifically, as shown in fig. 4, after the first audio (e.g., music a) is selected, an audio track 41 of the first audio and a track box 42 located on the audio track are displayed, and the track box 42 may be set to frame the first 15 seconds of the first audio by default. The third input is an input for moving the track box 42, and may be a drag input or a drag operation, and the like, and is not particularly limited herein. The user can move the track box 42 to the first position, that is, the audio segment 43 of the selected area framed by the track box 42 at the first position is the target audio, that is, the target audio can be the background music of the checkpoint template to be made. The audio time of the audio segment 43 intercepted by the audio track frame 42 can be displayed at the upper part, the user can click the play button to listen to the audio segment 43 intercepted by the user on trial, the cancel and restore button is arranged at the right side of the play button to facilitate the operation of the user, and after the background music is successfully selected, the 'finish' button at the upper right corner is clicked to finish the selection of the background music.

It should be noted that after selecting the first audio, the default position of the track frame 42 can be set according to the user's requirement. The user may also set a time threshold at which the track box 42 may be framed at most, e.g., the track box 42 may be framed at most 90s of audio segments as background music.

Step 102, in response to the first input, displaying N checkpoint template identifications, each checkpoint template identification indicating a checkpoint template segment of a target checkpoint template, the target checkpoint template being generated based on the target audio, N being a positive integer.

Specifically, in response to the first input of the target audio by the user, as shown in fig. 5, the user may operate a key of the one-key generation click template 51 on the current interface by clicking or the like, that is, displaying N click template identifiers, where each click template identifier indicates one click template segment of the target click template, and the target click template may be generated based on the audio information (e.g., note melody and/or beat and the like) of the target audio, that is, the target click template may be generated according to the audio information of the target audio.

It should be noted that the function of the checkpoint template identifier is to facilitate the user to select the target object content corresponding to each checkpoint template segment through an intuitive display manner.

For example: as shown in fig. 6, the N checkpoint template identifications include a first checkpoint template identification 61, a second checkpoint template identification 62, a third checkpoint template identification 63, and the like. If N is 3, 3 click template identifiers are respectively a first click template identifier 61, a second click template identifier 62 and a third click template identifier 63, where the first click template identifier 61, the second click template identifier 62 and the third click template identifier 63 respectively indicate a click template segment of the target click template, that is, the click template segment indicated by the first click template identifier 61, the click template segment indicated by the second click template identifier 62 and the click template segment indicated by the third click template identifier 63 jointly form the target click template, that is, the target click template is composed of three click template segments, which can save time for a user to make the click template and improve efficiency. The click template segment is a template segment of the target audio generated by performing one or more clicks according to the audio information of the target audio and dividing the target audio into a plurality of segments.

Step 103, receiving a second input of the target object by the user, wherein the target object comprises at least one of an image and a video.

Specifically, the user may select one or more objects from the plurality of objects as the target object, and determine the target object through a selection operation of the user. The user may select the number of target objects based on the number of checkpoint template identifications.

The second input is an input such as a click, a press, and a slide of the target object by the user, and the second input may also be a second operation, which is not specifically limited herein.

And 104, responding to the second input, synthesizing the target object with the checkpoint template segment indicated by the at least one checkpoint template identifier, and generating a target video.

Specifically, a target object (such as an image and/or a video in a local album) selected by a user may be synthesized with a checkpoint template segment indicated by a checkpoint template identifier, that is, the target object adds the target object and a target audio to the checkpoint template segment according to each checkpoint in the checkpoint template segment, adds the target object according to the checkpoint of the checkpoint template segment, and then generates a target video, that is, the user may directly generate the target video using the generated target checkpoint template or checkpoint template segment, so that time for the user to generate the video is saved, and efficiency is improved.

For example: the target card point template is equivalent to a folder, one card point template segment of the target card point template is just like one subfolder in the folder, and a user fills the whole target card point template by putting a target object in the subfolder so as to form a target video.

In the above embodiment of the present application, N checkpoint template identifiers are displayed through a first input of a user to a target audio, each checkpoint template identifier indicates one checkpoint template segment of a target checkpoint template, and a second input of the user to the target object is used to synthesize the target object and the checkpoint template segment indicated by at least one checkpoint template identifier to generate a target video.

Optionally, before step 102, the method further includes:

and under the condition that a first stuck point template matched with the audio track of the target audio exists in a preset stuck point template set, taking the first stuck point template as a target stuck point template.

Optionally, before step 102, the method further includes:

under the condition that a first stuck point template matched with the audio track of the target audio does not exist in a preset stuck point template set, acquiring audio information of the target audio;

and generating a target checkpoint template matched with the audio information of the target audio based on the audio information of the target audio.

Specifically, according to the audio track of the target audio, a first click template matched with the audio track of the target audio is searched in a preset click template set (namely, a database of the click templates), if the first click template exists in the preset click template set, the first click template can be directly used as the target click template, the time for manually searching the first click template by a user is reduced, the first click template matched with the audio track of the target audio can be directly obtained through the target audio, and the efficiency is improved. If the first stuck point template matched with the audio track of the target audio does not exist in the preset stuck point template set, the audio information of the target audio is obtained, the target stuck point template matched with the audio information of the target audio can be generated based on the audio information of the target audio, the time for a user to manually make the stuck point template is saved, the first stuck point template matched with the audio track of the target audio is automatically generated according to the audio information of the target audio, and the efficiency for making the stuck point template is improved.

Optionally, the audio information may include note melody and beat;

generating a target checkpoint template matched with the audio information of the target audio based on the audio information of the target audio, wherein the target checkpoint template comprises:

acquiring note melody and beat of the target audio frequency;

and generating a target card point template matched with the audio information of the target audio based on the note melody and the beat of the target audio.

Specifically, if the audio information includes a note melody and a beat, the note melody and the beat of the target audio are acquired under the condition that a first click template matched with the audio track of the target audio does not exist in a preset click template set, and a target click template matched with the audio information of the target audio is automatically generated according to the pause feeling and the beat of the note melody of the target audio. For example: according to the note melody, the first eight beats require two beats and one stuck point, the second eight beats require four beats and one stuck point, and the like.

For example: as shown in fig. 6, the first checkpoint template section indicated by the first checkpoint template identifier 61 needs two beats, the second checkpoint template section indicated by the second checkpoint template identifier 62 needs two beats, the third checkpoint template section indicated by the third checkpoint template identifier 63 needs four beats, and so on.

Optionally, the target object comprises a video; the target object comprises M target sub-objects, wherein M is a positive integer;

the step 103 includes:

receiving second input of the user to the M target sub-objects;

after the step 103 and before the step 104, the method further includes:

acquiring an input sequence of the second input;

displaying an ith object identifier of an ith target sub-object in a target area associated with the ith card point template identifier based on the input sequence, wherein the ith object identifier is used for indicating the target sub-object associated with the card point template segment indicated by the ith card point template identifier;

wherein i is a positive integer, and i is not more than N.

Specifically, the target object includes M target sub-objects, that is, the same video or the same image may be selected as the target object multiple times. And displaying the object identifications of different target sub-objects in the target areas associated with different checkpoint template identifications through the input sequence of the user for the second input of the M target sub-objects, wherein the different object identifications are used for indicating the target sub-objects associated with the checkpoint template segments indicated by the different checkpoint template identifications. Each target sub-object to be selected can display the video duration.

It should be noted that the object identifier is used for facilitating a user to select the target sub-object content corresponding to each checkpoint template segment through an intuitive display manner.

It should be noted that the ith checkpoint template identifier is one of the N checkpoint template identifiers, and the ith target sub-object is one of the M target sub-objects.

For example: as shown in fig. 7, when the second input is a click operation by the user, the order in which the user clicks and clicks the target sub-objects is the input order of the second input. If the first target sub-object clicked by the user is the first target sub-object 71, displaying a first object identifier of the first target sub-object 71 in a target area (i.e. within a frame of the first checkpoint template identifier 72) associated with the 1 st checkpoint template identifier (i.e. the first checkpoint template identifier 72), where the first object identifier is used to indicate the first target sub-object 71 associated with the checkpoint template segment indicated by the first checkpoint template identifier. The user can select the images and/or videos through the photo album video paging and the photo album image paging so as to select the proper images and/or videos according to the user requirements, namely the user only needs to select materials of a plurality of images and/or videos through the hooking, and the user can automatically correspond to the checkpoint template fragments indicated by the checkpoint template identification according to the hooking sequence, so that the operation is convenient, and the efficiency is higher.

Optionally, the checkpoint template identifier further includes a checkpoint template segment duration.

Specifically, as shown in fig. 7, a length of time of a section of the checkpoint template is displayed in each checkpoint template identifier, that is, a length of time of displaying videos and/or images that can be placed on the target checkpoint template indicated by each checkpoint template identifier, and a user can select a suitable video and/or image according to the length of time of displaying.

Before displaying the ith object identifier of the ith target sub-object in the target area associated with the ith checkpoint template identifier based on the input sequence, the method further includes:

acquiring the playing time length of the ith target sub-object;

and under the condition that the playing time length of the ith target sub-object is greater than the time length of the ith card point template segment indicated by the ith card point template identifier, editing the ith target sub-object into a sub-object with the playing time length being the same as the time length of the ith card point template segment.

Specifically, in the process of selecting the target sub-objects, the user may obtain the playing time length of each target sub-object, and if the playing time length of the ith target sub-object is greater than the time length of the ith click template segment indicated by the ith click template identifier, that is, the video playing time length is greater than the requirement of the click template segment time length, the user may clip the ith target sub-object by operations such as clicking the ith target sub-object, so that the playing time length of the clipped ith target sub-object is the same as the time length of the ith click template segment, thereby avoiding the situation that the video cannot be completely played. If the playing time of the ith target sub-object is less than the time of the ith card point template segment indicated by the ith card point template identifier, namely the video playing time is less than the requirement of the card point template segment time, the user can be prompted that the video playing time is not in line with the requirement, and the condition that the video playing is finished and a black screen appears is avoided.

For example: if the playing time length of the second target sub-object is 10s, and the time length of the second checkpoint template segment indicated by the second checkpoint template identifier is 5s, the playing time length of the second target sub-object is greater than the time length of the second checkpoint template segment indicated by the second checkpoint template identifier, at this time, the user may enter an editing interface by clicking the second checkpoint template segment indicated by the second checkpoint template identifier, as shown in fig. 8, the video track 81 and the video selection frame 82 of the second target sub-object associated with the second checkpoint template identifier are displayed, the user may intercept different video frames by sliding the video selection frame 82 on the video track 81, and the video frame framed by the video selection frame 82 is the second target sub-object after being clipped.

Optionally, after displaying the ith object identifier of the ith target sub-object in the target area associated with the ith checkpoint template identifier based on the input order, the method further includes:

receiving a fourth input of a user to a first checkpoint template identifier in the N checkpoint template identifiers;

in response to the fourth input, updating the dynamic effect parameters of the first checkpoint template identification associated target sub-objects.

Specifically, after the target sub-object is selected, the user can edit the dynamic effect parameters of different target sub-objects according to the requirement, and the dynamic effect required by the user is changed.

The fourth input is input such as clicking, pressing, sliding and the like of the first checkpoint template identifier by the user, and the fourth input may also be a fourth operation, which is not specifically limited herein.

For example: as shown in fig. 9, the fourth input is a click operation, if the user clicks the first click template identifier, a plurality of options (e.g., action 91, replace 92, clip 93, etc.) are displayed above the first click template identifier, and if the user clicks the button of the action 91, as shown in fig. 10, a plurality of action parameter options 101 (e.g., action a, action b, action c, etc.) are displayed above the first click template identifier, and if the user clicks the action a, the action of the target sub-object associated with the first click template identifier is updated to the action a, so that the user can edit the action parameters of the target sub-object again as needed.

receiving fifth input of a user to a second checkpoint template identifier and a first target sub-object in the N checkpoint template identifiers;

in response to the fifth input, replacing the second checkpoint template identification associated target sub-object with the first target sub-object.

Specifically, after the target sub-object is selected, the user may replace the target sub-object associated with the second checkpoint template identifier as needed, so as to replace other target sub-objects that the user needs.

It should be noted that the fifth input is input such as clicking, pressing, sliding, and the like of the second checkpoint template identifier by the user, and the fifth input may also be a fifth operation, which is not specifically limited herein.

For example: as shown in fig. 9, the fifth input is a click operation, if the user clicks the second click template identifier, a plurality of options (e.g., action 91, replace 92, clip 93, etc.) are displayed above the second click template identifier, and if the user clicks a button of replace 92, a plurality of target sub-objects are displayed, and the user may select one of the target sub-objects (e.g., the first target sub-object) as needed to replace the target sub-object associated with the second click template identifier, so that the user may reselect the target sub-object associated with the second click template identifier as needed.

receiving a sixth input of the user to a third checkpoint template identifier in the N checkpoint template identifiers;

responding to the sixth input, displaying a video track and a video selection frame of the target sub-object associated with the third checkpoint template identifier, wherein the video duration corresponding to the video selection frame is equal to the checkpoint template segment duration of the checkpoint template segment indicated by the third checkpoint template identifier;

receiving a seventh input of the video marquee by the user;

in response to the seventh input, updating the video marquee to a target position of the video track and based on the target position, clipping a target sub-object associated with the third checkpoint template identification;

wherein the clipped third checkpoint template identification-associated target sub-object comprises all video frames framed by the video marquee at the target position.

Specifically, after the target sub-object is selected, the user may clip the target sub-object associated with the third checkpoint template identifier as needed, and clip the target sub-object associated with the third checkpoint template identifier as needed by the user.

The sixth input is input such as clicking, pressing, sliding and the like of the third checkpoint template identifier by the user, and the sixth input may also be a sixth operation, which is not specifically limited herein. The seventh input is an input such as clicking, pressing, sliding and the like of the video marquee by the user, and the seventh input may also be a seventh operation, which is not specifically limited herein.

Specifically, after the user performs the sixth input on the third checkpoint template identifier, the user enters an editing interface, as shown in fig. 8, the video track 81 and the video marquee 82 of the target sub-object associated with the third checkpoint template identifier are displayed, the user can capture different video frames by sliding the video marquee 82 on the video track 81, and the video frame framed by the video marquee 82 is the target sub-object associated with the clipped third checkpoint template identifier. The video duration corresponding to the video frame framed by the video selection frame 82 is equal to the checkpoint template segment duration of the checkpoint template segment indicated by the third checkpoint template identifier.

Before or after the user performs operations such as replacing the target object, updating the dynamic effect parameters, clipping and the like, the user can view the target video by clicking the play button, so that the user can preview the effect of the target video.

Optionally, as shown in fig. 10, after the target video is edited, the user may click the export key at the upper right corner to export the manufactured target video, which may be stored in an album or shared. In addition, the user can store the target checkpoint template or the target checkpoint template fragment, the user can conveniently and directly use the target checkpoint template or the target checkpoint template fragment to generate the target video through one key, and the operation is simple and rapid.

To sum up, in the embodiment of the present application, through the first input of the user to the target audio, N checkpoint template identifiers are displayed, each checkpoint template identifier indicates one checkpoint template segment of the target checkpoint template, and through the second input of the user to the target object, the target object is synthesized with the checkpoint template segment indicated by at least one checkpoint template identifier to generate the target video, the target checkpoint template can be generated according to the target audio required to be selected by the user, and the target video can be generated through the generated target checkpoint template and the target object selected by the user, so that the operation is simple and convenient, the time is saved, and the efficiency of manufacturing the checkpoint template and the video is improved.

It should be noted that, in the video generation method provided in the embodiment of the present application, the execution subject may be a video generation apparatus, or a control module in the video generation apparatus for executing the loaded video generation method. In the embodiment of the present application, a video generation device executes a loaded video generation method as an example, and the video generation method provided in the embodiment of the present application is described.

As shown in fig. 11, an embodiment of the present application further provides a video generating apparatus 110, including:

a first receiving module 111, configured to receive a first input of a target audio by a user;

a first response module 112, configured to display, in response to the first input, N checkpoint template identifications, each checkpoint template identification indicating one checkpoint template segment of a target checkpoint template, the target checkpoint template being generated based on the target audio, N being a positive integer;

a second receiving module 113, configured to receive a second input to the target object from the user;

a second response module 114, configured to, in response to the second input, synthesize the target object with the checkpoint template segment indicated by the at least one checkpoint template identifier, and generate a target video;

wherein the target object includes at least one of an image and a video.

Optionally, the apparatus further comprises:

the first display module is used for displaying an audio track of a first audio and a track frame positioned on the audio track;

the third receiving module is used for receiving a third input of the user to the track frame;

a third response module to update the track box to the first position of the audio track in response to the third input;

a first determination module to determine an audio segment of the region framed by the track box at a first location as a target audio.

Optionally, the apparatus further comprises:

and the first processing module is used for taking the first checkpoint template as the target checkpoint template under the condition that the first checkpoint template matched with the audio track of the target audio exists in a preset checkpoint template set.

Optionally, the apparatus further comprises:

the first acquisition module is used for acquiring the audio information of the target audio under the condition that a first stuck point template matched with the audio track of the target audio does not exist in a preset stuck point template set;

and the first generation module is used for generating a target checkpoint template matched with the audio information of the target audio based on the audio information of the target audio.

Optionally, the audio information includes note melody and beat;

the first generation module includes:

a first obtaining unit configured to obtain a note melody and a tempo of the target audio;

and the first generation unit is used for generating a target card point template matched with the audio information of the target audio based on the note melody and the beat of the target audio.

the second receiving module 113 includes:

the first receiving unit is used for receiving second input of the user to the M target sub-objects;

the device further comprises:

the second acquisition module is used for acquiring the input sequence of the second input;

a second display module, configured to display, based on the input order, an ith object identifier of an ith target sub-object in a target area associated with an ith checkpoint template identifier, where the ith object identifier is used to indicate a target sub-object associated with a checkpoint template segment indicated by the ith checkpoint template identifier;

wherein i is a positive integer, and i is not more than N.

Optionally, the checkpoint template identifier further includes a checkpoint template segment duration;

the device further comprises:

the third acquisition module is used for acquiring the playing time length of the ith target sub-object;

and the second processing module is used for clipping the ith target sub-object into a sub-object with the playing duration being the same as the duration of the ith card point template segment under the condition that the playing duration of the ith target sub-object is longer than the duration of the ith card point template segment indicated by the ith card point template identifier.

Optionally, the apparatus further comprises:

the fourth receiving module is used for receiving fourth input of a user to a first checkpoint template identifier in the N checkpoint template identifiers;

and the fourth response module is used for responding to the fourth input and updating the dynamic effect parameters of the target sub-object associated with the first checkpoint template identification.

Optionally, the apparatus further comprises:

a fifth receiving module, configured to receive a fifth input of the user to a second checkpoint template identifier and a first target child object in the N checkpoint template identifiers;

a fifth response module, configured to replace the target sub-object associated with the second checkpoint template identification with the first target sub-object in response to the fifth input.

Optionally, the apparatus further comprises:

a sixth receiving module, configured to receive a sixth input of a third checkpoint template identifier of the N checkpoint template identifiers from the user;

a sixth response module, configured to respond to the sixth input, display a video track and a video marquee of the target sub-object associated with the third checkpoint template identifier, where a video duration corresponding to the video marquee is equal to a checkpoint template fragment duration of the checkpoint template fragment indicated by the third checkpoint template identifier;

a seventh receiving module, configured to receive a seventh input to the video marquee from the user;

a seventh response module, configured to update the video marquee to the target position of the video track in response to the seventh input, and clip a target sub-object associated with the third checkpoint template identifier based on the target position;

The video generation device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a kiosk, and the like, and the embodiments of the present application are not particularly limited.

The video generation apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The video generation device provided in the embodiment of the present application can implement each process implemented by the video generation device in the method embodiments of fig. 1 to fig. 10, and for avoiding repetition, details are not repeated here.

Optionally, an embodiment of the present application further provides an electronic device, which includes a processor, a memory, and a program or an instruction stored in the memory and capable of running on the processor, where the program or the instruction, when executed by the processor, implements each process of the video generation method embodiment, and can achieve the same technical effect, and details are not repeated here to avoid repetition.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 12 is a schematic hardware structure diagram of an electronic device implementing an embodiment of the present application.

The electronic device 120 includes, but is not limited to: a radio frequency unit 121, a network module 122, an audio output unit 123, an input unit 124, a sensor 125, a display unit 126, a user input unit 127, an interface unit 128, a memory 129, and a processor 1210.

Those skilled in the art will appreciate that the electronic device 120 may further include a power source (e.g., a battery) for supplying power to the various components, and the power source may be logically connected to the processor 1210 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 12 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.

The user input unit 127 is used for receiving a first input of a target audio by a user;

a display unit 126 for displaying N checkpoint template identifications in response to the first input, each checkpoint template identification indicating one checkpoint template segment of a target checkpoint template, the target checkpoint template being generated based on the target audio, N being a positive integer;

the user input unit 127 is further configured to receive a second input of the target object by the user;

the processor 1210 is configured to synthesize the target object with the checkpoint template segment indicated by the at least one checkpoint template identifier in response to the second input, and generate a target video;

wherein the target object includes at least one of an image and a video.

In the above embodiment of the application, N checkpoint template identifiers are displayed through first input of a user to a target audio, each checkpoint template identifier indicates one checkpoint template segment of a target checkpoint template, and through second input of the user to a target object, the target object is synthesized with the checkpoint template segment indicated by at least one checkpoint template identifier to generate a target video.

Optionally, the display unit 126 is further configured to display an audio track of a first audio and a track frame located on the audio track;

the user input unit 127 is further configured to receive a third input of the track frame from the user;

the processor 1210 is further configured to:

Optionally, the processor 1210 is further configured to, when a first checkpoint template matching the audio track of the target audio exists in a preset checkpoint template set, use the first checkpoint template as the target checkpoint template.

Optionally, the processor 1210 is further configured to:

Optionally, the audio information includes note melody and beat;

the processor 1210 is specifically configured to: acquiring note melody and beat of the target audio frequency;

the user input unit 127 is specifically configured to receive a second input of the M target sub-objects from the user;

acquiring an input sequence of the second input;

wherein i is a positive integer, and i is not more than N.

the processor 1210 is further configured to:

acquiring the playing time length of the ith target sub-object;

Optionally, the user input unit 127 is further configured to receive a fourth input of a first checkpoint template identifier of the N checkpoint template identifiers from the user;

the processor 1210 is further configured to: in response to the fourth input, updating the dynamic effect parameters of the first checkpoint template identification associated target sub-objects.

Optionally, the user input unit 127 is further configured to receive a fifth input of the user to a second checkpoint template identifier and a first target sub-object in the N checkpoint template identifiers;

the processor 1210 is further configured to: in response to the fifth input, replacing the second checkpoint template identification associated target sub-object with the first target sub-object.

Optionally, the user input unit 127 is further configured to receive a sixth input of a third checkpoint template identifier of the N checkpoint template identifiers from the user;

the processor 1210 is further configured to: responding to the sixth input, displaying a video track and a video selection frame of the target sub-object associated with the third checkpoint template identifier, wherein the video duration corresponding to the video selection frame is equal to the checkpoint template segment duration of the checkpoint template segment indicated by the third checkpoint template identifier;

the user input unit 127 is further configured to receive a seventh input of the video marquee from the user;

the processor 1210 is further configured to: in response to the seventh input, updating the video marquee to a target position of the video track and based on the target position, clipping a target sub-object associated with the third checkpoint template identification;

The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above-mentioned video generation method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above video generation method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of video generation, comprising:

receiving a first input of a target audio by a user;

receiving a second input of the target object by the user;

wherein the target object includes at least one of an image and a video.

2. The method of claim 1, wherein prior to the receiving the first user input of the target audio, the method further comprises:

receiving a third input of the track frame by the user;

3. The method of claim 1, wherein prior to displaying the N checkpoint template identifications, the method further comprises:

4. The method of claim 1, wherein prior to displaying the N checkpoint template identifications, the method further comprises:

5. The method of claim 4, wherein the audio information comprises note melodies and beats;

acquiring note melody and beat of the target audio frequency;

6. The method of claim 1, wherein the target object comprises a video; the target object comprises M target sub-objects, wherein M is a positive integer;

the receiving of the second input of the target object by the user comprises:

receiving second input of the user to the M target sub-objects;

after the second input of the user to the target object is received, before the target object is synthesized with the checkpoint template segment indicated by the at least one checkpoint template identifier and the target video is generated, the method further includes:

acquiring an input sequence of the second input;

wherein i is a positive integer, and i is not more than N.

7. The method of claim 6, wherein the checkpoint template identification further comprises a checkpoint template segment duration;

acquiring the playing time length of the ith target sub-object;

8. The method of claim 6, wherein after the ith object id of the ith target sub-object is displayed in the ith target area identified by the stuck point template based on the input order, the method further comprises:

9. The method of claim 6, wherein after the ith object id of the ith target sub-object is displayed in the ith target area identified by the stuck point template based on the input order, the method further comprises:

10. The method of claim 6, wherein after the ith object id of the ith target sub-object is displayed in the ith target area identified by the stuck point template based on the input order, the method further comprises:

receiving a seventh input of the video marquee by the user;

11. A video generation apparatus, comprising:

wherein the target object includes at least one of an image and a video.

12. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the video generation method of any of claims 1 to 5.

13. A readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the video generation method according to any one of claims 1 to 5.