CN112866776A

CN112866776A - Video generation method and device

Info

Publication number: CN112866776A
Application number: CN202011607951.7A
Authority: CN
Inventors: 刘琨
Original assignee: Beijing Jindi Technology Co Ltd
Current assignee: Beijing Jindi Technology Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-05-28
Anticipated expiration: 2040-12-29
Also published as: CN112866776B

Abstract

The embodiment of the disclosure discloses a video generation method, a video generation device, electronic equipment, a storage medium and a computer program. The video generation method comprises the following steps: acquiring audio time of caption audio respectively matched with each lens in a lens sequence to be aligned; determining the difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot aiming at each shot in the shot sequence, and determining the time when the caption audio matched with the shot plays the time length indicated by the difference value as the animation starting time of the last animation included in the shot; and generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, and the animation starting time of the last animation included by each shot in the shot sequence to be aligned. The embodiment of the disclosure improves the synchronism of the picture and the audio in the video playing process.

Description

Video generation method and device

Technical Field

The present disclosure relates to video processing technologies, and in particular, to a video generation method, apparatus, electronic device, storage medium, and computer program.

Background

With the rapid development of the internet, acquiring information through videos becomes an increasingly important information acquisition mode for internet users. For example, when introducing a company, the information of each company in various aspects such as basic information, business conditions, development and risk can be presented in a video form by combining rich pictures and sounds. Compared with the traditional text mode, the method can improve the efficiency of receiving information and enhance the attraction to users.

In the prior art, the video can be generated in batches by adopting codes. However, the video generated by the code batch generation method often has the problem that the picture and the audio are not synchronous in the playing process.

Disclosure of Invention

The embodiment of the disclosure provides a video generation method, a video generation device, an electronic device, a storage medium and a computer program, which can ensure that the last animation of a shot of a generated video is played synchronously with a subtitle and an audio matched with the shot, and improve the synchronism of a picture and the audio in the video playing process.

According to an aspect of the embodiments of the present disclosure, there is provided a video generation method, including: acquiring audio time of caption audio matched with each shot in a shot sequence to be aligned respectively, wherein in the playing process of the shots in the shot sequence, caption texts indicated by the caption audio matched with the shots are presented in the shots, each shot in the shot sequence comprises an animation sequence, and each animation sequence comprises at least one animation;

for each shot in the shot sequence, determining a difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot, and determining the time when the caption audio matched with the shot plays the time length indicated by the difference value as the animation starting time of the last animation included in the shot;

and generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, and the animation starting time of the last animation included in each shot in the shot sequence to be aligned.

Optionally, in the method according to any embodiment of the present disclosure, the generating a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned includes:

for each shot in the shot sequence, determining the starting time of the subtitle audio matched with the shot as the animation starting time of the first animation included in the shot;

and generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation and the animation starting time of the first animation included in each shot in the shot sequence to be aligned.

Optionally, in the method according to any embodiment of the present disclosure, the generating a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, the animation start time of the last animation included in each shot in the sequence of shots to be aligned, and the animation start time of the first animation includes:

determining the interval duration between adjacent animations in the animation sequence included in the shot based on the number of animations in the animation sequence included in the shot;

and generating a video based on the shot sequence to be aligned, the caption audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included by each shot in the shot sequence to be aligned, the animation starting time of the first animation and the interval duration between adjacent animations in the animation sequence included by each shot.

Optionally, the determining, based on the number of animations in the animation sequence included in the shot, an interval duration between adjacent animations in the animation sequence included in the shot includes:

and determining the interval duration between the adjacent animations in the animation sequence included by the shot based on the number of the animations in the animation sequence included by the shot, the total animation duration of the animations in the animation sequence included by the shot and the audio duration of the caption audio matched with the shot.

respectively identifying a voice middle stop point and a voice starting point which are equal to the number of animations in an animation sequence included in the shot in the caption audio matched with the shot;

and determining the interval duration between adjacent animations in the animation sequence included in the shot based on the determined voice starting point and the determined voice ending point.

for each shot in the shot sequence, determining the start time of the shot based on the total audio duration of the caption audio matched with the previous shot of the shot;

and generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included by each shot in the shot sequence to be aligned, and the shot starting time of each shot in the shot sequence to be aligned.

Optionally, in the method according to any embodiment of the present disclosure, the determining a shot start time of the shot based on an audio total duration of subtitle audio that matches a preceding shot of the shot includes:

and determining the end time of the total audio duration of the caption audio matched with the previous shot of the shot as the start time of the shot.

for each lens in the lens sequence, determining the audio starting time of the caption audio matched with the lens based on the total lens duration of the preceding lens of the lens;

and generating a video based on the shot sequence to be aligned, the caption audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included by each shot in the shot sequence to be aligned, and the audio starting time of the caption audio matched with each shot in the shot sequence to be aligned.

Optionally, in the method according to any embodiment of the present disclosure, the determining an audio start time of the subtitle audio that matches the shot based on a total shot duration of a preceding shot of the shot includes:

and determining the end time of the total shot duration of the preceding shot of the shot as the audio starting time of the caption audio matched with the shot.

aiming at each lens in the lens sequence, setting the playing mode of the lens as that the preset animation sequence in the lens is played circularly under the condition that the last animation in the lens is played completely and the caption audio corresponding to the lens is not played completely;

and generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included by each shot in the shot sequence to be aligned, and the playing mode of each shot in the shot sequence to be aligned.

Optionally, in the method according to any embodiment of the present disclosure, adjacent animations in the preset animation sequence are played seamlessly.

acquiring the number of elements contained in the animation to be generated;

determining templates matched with the number of the elements from a predetermined template set, wherein each template in the template set is used for determining the position of each element in the animation to be generated;

respectively importing each element in the animation to be generated into the determined template to generate an animation in the animation sequence;

and generating a video based on the generated animations, the subtitle audio matched with the shots in the shot sequence to be aligned and the animation starting time of the last animation included in the shots in the shot sequence to be aligned.

Optionally, in the method of any embodiment of the present disclosure, the method further includes:

and in response to the fact that no template matched with the number of the elements exists in the template set, sequentially presenting the elements at preset positions according to a preset sequence in the process of playing the animation comprising the elements with the number of the elements.

the method includes determining a shot duration for a shot based on an audio duration of subtitle audio corresponding to the shot.

Optionally, in the method according to any embodiment of the present disclosure, each shot in the sequence of shots to be aligned is used to present one or more of basic company information, business conditions, development conditions, and risk conditions through text and/or images, and the subtitle audio matched with a shot is the audio of the text in the shot.

According to a second aspect of the embodiments of the present disclosure, there is provided a video generating apparatus, including: a first obtaining unit configured to obtain audio durations of caption audios respectively matched with respective shots in a shot sequence to be aligned, wherein during playing of the shots in the shot sequence, caption texts indicated by the caption audios matched with the shots are presented in the shots, each shot in the shot sequence comprises an animation sequence, and each animation sequence comprises at least one animation;

a first determining unit configured to determine, for each shot in the shot sequence, a difference between an audio time length of the subtitle audio matching the shot and an animation time length of a last animation included in the shot, and determine a time at which the subtitle audio matching the shot plays a time length indicated by the difference as an animation start time of the last animation included in the shot;

a generating unit configured to generate a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, and an animation start time of a last animation included in each shot in the shot sequence to be aligned.

Optionally, in the apparatus according to any embodiment of the present disclosure, the generating unit includes:

a first determining subunit configured to determine, for each shot in the shot sequence, a start time of subtitle audio matching the shot as an animation start time of a first animation included in the shot;

a first generation subunit configured to generate a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation start time of the last animation included in each shot in the shot sequence to be aligned, and the animation start time of the first animation.

Optionally, in the apparatus according to any embodiment of the present disclosure, the first generating subunit includes:

a first determination module configured to determine a duration of an interval between adjacent animations in the animation sequence included in the shot based on a number of animations in the animation sequence included in the shot;

the generating module is configured to generate a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included in each shot in the shot sequence to be aligned, the animation starting time of the first animation and the interval duration between adjacent animations in the animation sequence included in each shot.

a second determining subunit configured to determine, for each shot in the shot sequence, a shot start time of the shot based on an audio total duration of subtitle audio that matches a preceding shot of the shot;

a second generation subunit configured to generate a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation start time of the last animation included in each shot in the shot sequence to be aligned, and the shot start time of each shot in the shot sequence to be aligned.

Optionally, in the apparatus of any embodiment of the present disclosure, the second determining subunit includes:

and the second determination module is configured to determine the end time of the total audio duration of the caption audio matched with the previous shot of the shot as the start time of the shot.

a third determining subunit configured to determine, for each shot in the shot sequence, an audio start time of subtitle audio that matches a preceding shot of the shot based on a total shot duration of the shot;

a third generating subunit configured to generate a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation start time of the last animation included in each shot in the shot sequence to be aligned, and the audio start time of the subtitle audio matched with each shot in the shot sequence to be aligned.

Optionally, in the apparatus of any embodiment of the present disclosure, the third determining subunit includes:

and the third determining module is configured to determine the end time of the total shot duration of the preceding shot of the shot as the audio starting time of the caption audio matched with the shot.

the playing subunit is configured to set the playing mode of each shot in the shot sequence to be a preset animation sequence in the shot in a circulating way under the condition that the last animation in the shot is played completely and the caption audio corresponding to the shot is not played completely;

and the fourth generation subunit is configured to generate a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included in each shot in the shot sequence to be aligned, and the playing mode of each shot in the shot sequence to be aligned.

Optionally, in the apparatus according to any embodiment of the present disclosure, adjacent animations in the preset animation sequence are played seamlessly.

a fourth acquiring subunit configured to acquire the number of elements included in the animation to be generated;

a fourth determining subunit, configured to determine, from a predetermined template set, templates that match the number of elements, where each template in the template set is used to determine a position of each element in the animation to be generated;

the fifth generation subunit is configured to respectively import each element in the animation to be generated into the determined template so as to generate the animation;

a sixth generating subunit configured to generate a video based on the generated animations, the subtitle audio matched with the shots in the sequence of shots to be aligned, and an animation start time of a last animation included in each shot in the sequence of shots to be aligned.

Optionally, in the apparatus of any embodiment of the present disclosure, the apparatus further includes:

and the presentation unit is configured to respond to the situation that no template matched with the number of the elements exists in the template set, and sequentially present the elements at preset positions according to a preset sequence in the process of playing the animation comprising the elements with the number of the elements.

a seventh determining unit configured to determine a shot duration for a shot based on an audio duration of subtitle audio corresponding to the shot.

Optionally, in the apparatus according to any embodiment of the present disclosure, each shot in the sequence of shots to be aligned is used to present one or more of basic company information, business conditions, development conditions, and risk conditions through text and/or images, and the subtitle audio matched with a shot is the audio of the text in the shot.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a memory for storing a computer program;

a processor for executing the computer program stored in the memory, and the computer program, when executed, implements the method of any of the above embodiments of the present disclosure.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer readable medium, which when executed by a processor, implements the method of any of the embodiments of the video generating method of the first aspect described above.

Based on the video generation method, apparatus, electronic device, storage medium, and computer program provided by the above embodiments of the present disclosure, the audio duration of the caption audio respectively matching each shot in the shot sequence to be aligned may be obtained, wherein during the playing of a shot in the shot sequence, a caption text indicated by the caption audio matching the shot is presented in the shot, each shot in the shot sequence includes an animation sequence, each animation sequence includes at least one animation, then, for each shot in the shot sequence, a difference between the audio duration of the caption audio matching the shot and the animation duration of the last animation included in the shot is determined, the time at which the caption audio matching the shot plays the duration indicated by the difference is determined as the animation start time of the last animation included in the shot, and finally, generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, and the animation starting time of the last animation included by each shot in the shot sequence to be aligned. Therefore, the time when the caption audio matched with the lens plays the time indicated by the difference is determined as the animation starting time of the last animation of the lens by determining the difference between the audio time of the caption audio matched with the lens and the animation time of the last animation included by the lens, so that the last animation of the lens of the generated video and the caption audio matched with the lens can be ensured to be played synchronously, and the synchronism of the picture and the audio in the video playing process is improved. The problems of video such as too fast, too slow and blocking caused by stretching and compressing are solved.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be understood more clearly and in accordance with the following detailed description, taken with reference to the accompanying drawings,

wherein:

fig. 1 is a flow chart of a first embodiment of a video generation method of the present disclosure.

Fig. 2 is a flowchart of a second embodiment of the video generation method of the present disclosure.

Fig. 3 is a flowchart of a third embodiment of the video generation method of the present disclosure.

Fig. 4 is a flowchart of a fourth embodiment of the video generation method of the present disclosure.

Fig. 5 is a schematic structural diagram of an embodiment of a video generation apparatus according to the present disclosure.

Fig. 6 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to at least one of a terminal device, a computer system, and a server, which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with at least one electronic device of a terminal device, computer system, and server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

At least one of the terminal device, the computer system, and the server may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Referring to fig. 1, a flow 100 of a first embodiment of a video generation method according to the present disclosure is shown. The video generation method comprises the following steps:

and 101, acquiring the audio time length of the caption audio respectively matched with each shot in the shot sequence to be aligned.

In this embodiment, an execution subject (e.g., a server, a terminal device, etc.) of the video generation method may obtain, from other electronic devices or locally, audio durations of subtitle audio respectively matching respective shots in a sequence of shots to be aligned in a wired or wireless connection manner. During the playing process of the shots in the shot sequence, the shots present caption texts indicated by caption audios matched with the shots, each shot in the shot sequence comprises an animation sequence, and each animation sequence comprises at least one animation.

Where each shot in the sequence of shots may be used to present information on one aspect of the company. For example, company basic information, business conditions, developments, and risks may be presented in different shots, respectively. The audio duration may be a play duration of the subtitle audio. In general, the execution body may automatically generate subtitle audio corresponding to the subtitle text based on the subtitle text. After the subtitle audio is generated, the audio duration of the subtitle audio may be determined.

In some optional implementations of the embodiment, each shot in the sequence of shots to be aligned is used to present one or more of basic company information, business conditions, development conditions, risk conditions by text and/or images, and the subtitle audio matching the shot is the audio of the text (e.g., subtitle text) in the shot.

Here, the acquired shot sequence may be a part of a video generated in batch using a code. In generating video in batches using code, it is often necessary to generate video based on shots and audio. Various animations may be included in the shots, such as fly-in, fly-out, spin, and so forth. During the playing of the shot, a plurality of pictures may be sequentially presented in the shot, and each picture may include a plurality of elements. The pictures can be switched by animation. The element may be a character or an image, and for example, the element may be the subtitle text, a name of a company, a logo, or the like. The individual elements may also be displayed or hidden by animation.

In addition, after acquiring the audio durations of the subtitle audio respectively matching the respective shots in the sequence of shots to be aligned, the execution main body may store the acquired respective audio durations locally.

And 102, for each shot in the shot sequence, determining a difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot, and determining the moment when the caption audio matched with the shot plays the time length indicated by the difference value as the animation starting moment of the last animation included in the shot.

In this embodiment, for each shot in the shot sequence, the execution subject may first determine a difference between an audio duration of the caption audio matching the shot and an animation duration of the last animation included in the shot, where the difference may indicate a duration. Then, the execution subject may determine a time at which the subtitle audio matching the cut is played for the duration as an animation start time of a last animation included in the cut. For example, if the difference is 5 seconds, the execution body may determine a time when the subtitle audio matching the cut is played for 5 seconds as an animation start time of the last animation included in the cut. Thus, in this example, the last animation included in the shot may be started to be played at the time (i.e., the time when the subtitle audio matching the shot is played for 5 seconds), and the non-played part of the subtitle audio matching the shot may be continuously played at the same time, so that the subtitle audio matching the shot and the last animation included in the shot are synchronized to complete the playing.

The animation starting time may be used to indicate a time when the animation starts to play. The audio duration may be a play duration of the subtitle audio. In general, the execution body may automatically generate subtitle audio corresponding to the subtitle text based on the subtitle text. After the subtitle audio is generated, the audio duration of the subtitle audio may be determined.

It should be noted that the subtitle audio matching the shot may not be played in the process of determining the animation start time of the last animation included in the shot.

Here, the executing entity may execute the 101 and 102 after determining to play the video, or may execute the 101 and 102 in a process of generating the video before determining to play the video, which is not limited in this embodiment of the present disclosure.

In addition, in the process of generating a video or playing a video, the execution body may animate each shot, and tile elements (such as pictures and characters) of the corresponding size and position of each frame to the base map frame by frame. At this time, the start animation and the end animation of the pictures and the characters are generated according to the number of frames. However, the same scene of different companies has different contents and different durations, so that the animation ending time points cannot be unified.

And 103, generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, and the animation starting time of the last animation included in each shot in the shot sequence to be aligned.

In this embodiment, the execution subject may generate the video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned. In the playing process of the generated video, the last animation of the shot of the generated video and the caption audio matched with the shot can be synchronously played.

The video generation method provided by the foregoing embodiment of the present disclosure may acquire audio durations of subtitle audio frequencies respectively matched with respective shots in a shot sequence to be aligned, where during a playing process of a shot in the shot sequence, a subtitle text indicated by a subtitle audio frequency matched with the shot is present in the shot, each shot in the shot sequence includes an animation sequence, and then, for each shot in the shot sequence, a difference between an audio duration of the subtitle audio frequency matched with the shot and an animation duration of a last animation included in the shot is determined, and a time at which the subtitle audio frequency matched with the shot plays the duration indicated by the difference is determined as an animation start time of the last animation included in the shot. Therefore, the difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot is determined, and the time when the caption audio matched with the shot plays the time length indicated by the difference value is determined as the animation starting time of the last animation of the shot, so that the last animation of the shot of the generated video and the caption audio matched with the shot can be ensured to be synchronously played, and the synchronism of the picture and the audio in the video playing process is improved. The problems of video such as too fast, too slow and blocking caused by stretching and compressing are solved.

In some optional implementations of this embodiment, the foregoing 103 may include:

in the first step, for each shot in the shot sequence, the execution subject may further determine a start time of subtitle audio matching the shot as an animation start time of a first animation included in the shot.

Here, the start animation may start playing from the start time of the subtitle audio matching the shot to end playing after the entry of the element. The time of the entry completion may be preset.

And secondly, generating a video based on the shot sequence to be aligned, the caption audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included in each shot in the shot sequence to be aligned and the animation starting time of the first animation.

Here, when playing the generated video, it is possible to play the last animation at the animation start time of the last animation and the first animation at the animation start time of the first animation.

It will be appreciated that the above alternative implementation can improve the synchronization between the played caption audio and the elements presented to the user by determining the start time of the caption audio matched with the shot as the animation start time of the first animation included in the shot, so as to ensure that the video is presented to the user through the animation once the caption audio starts to be played, thereby improving the timeliness of the presentation of the elements (such as company name, company logo and other materials) in the video.

In some optional implementations of this embodiment, the second step may include:

first, based on the number of animations in the animation sequence included in the shot, the interval duration between adjacent animations in the animation sequence included in the shot is determined.

As an example, the execution subject may determine the interval duration between adjacent animations in the animation sequence included in the shot based on the number of animations in the animation sequence included in the shot, the total duration of animations of the animations in the animation sequence included in the shot, and the audio duration of the subtitle audio matching the shot.

Specifically, the execution subject may calculate an animation total duration of an animation in an animation sequence included in the shot, and an audio duration of subtitle audio matching the shot. If the difference value between the audio time length and the total animation time length is larger than or equal to a preset first time length threshold (such as 0, 1 second and the like), calculating the result of subtracting one from the number of the animations, and then taking the quotient value between the difference value and the result as the interval time length between the adjacent animations in the animation sequence.

As still another example, the execution subject may also first identify, in the subtitle audio that matches the shot, a speech center point and a speech start point that are equal to the number of animations in the animation sequence included in the shot, respectively. Then, based on the determined voice starting point and voice ending point, the interval duration between adjacent animations in the animation sequence included in the shot is determined.

Specifically, the execution main body may further identify a speech center point and a speech start point in the caption audio equal to the above number. The voice middle stop point represents the voice pause position in the caption audio, and the voice starting point represents the voice start position in the caption audio. It should be understood that there may be a speech mid-point and a speech start point in the caption speech. For example, the company basic information may include information such as a company name and the number of employees, and a voice middle point and a voice start point may exist between caption audios corresponding to the company name and the number of employees, respectively, in the process of generating the company basic information. The number of the recognized voice stop points is the number of animations in the animation sequence included in the shot, and the number of the recognized voice start points is the number of animations in the animation sequence included in the shot.

Then, based on the determined voice starting point and voice ending point, the interval duration between adjacent animations in the animation sequence included in the shot is determined. And generating a video based on the shot sequence to be aligned, the caption audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included by each shot in the shot sequence to be aligned, the animation starting time of the first animation and the interval duration between adjacent animations in the animation sequence included by each shot.

Here, when playing the generated video, it is possible to start playing the last animation at the animation start time of the last animation, start playing the first animation at the animation start time of the first animation, and play the remaining animations at the interval duration between adjacent animations in the animation sequence included in each shot.

It will be appreciated that in the alternative implementations described above, the presentation of each animation can be made more synchronous with the playback of the subtitle audio by determining the duration of the interval between adjacent animations in the sequence of animations included in the shot.

In some optional implementation manners of this embodiment, the foregoing 103 may further include:

in the first step, the execution subject may further determine, for each shot in the shot sequence, an audio start time of the subtitle audio that matches the shot based on a total shot duration of a preceding shot of the shot.

In some application scenarios in the foregoing implementation, the executing entity may determine the audio start time of the subtitle audio matching the shot in the following manner: and determining the end time of the total shot duration of the preceding shot of the shot as the audio starting time of the caption audio matched with the shot.

Optionally, the executing entity may also determine an audio start time of the subtitle audio matched with the shot in the following manner: and determining the ending time of the time length indicated by the sum of the shot time length of the shot and the animation time length of the first animation in the shot as the audio starting time of the caption audio matched with the shot.

And secondly, generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included by each shot in the shot sequence to be aligned, and the audio starting time of the subtitle audio matched with each shot in the shot sequence to be aligned.

Here, when playing the generated video, it is possible to start playing the last animation at the animation start time of the last animation and start playing each subtitle audio at the audio start time of each subtitle audio.

It can be appreciated that the above alternative implementation further improves the synchronicity of the shots and the subtitle audio by determining the audio start time of the subtitle audio matching the shot.

In some optional implementations of the embodiment, the execution subject may further determine a duration of a shot of the shot based on an audio duration of the subtitle audio corresponding to the shot.

As an example, the execution body may determine an audio duration of subtitle audio corresponding to a shot as a shot duration of the shot.

As still another example, the execution subject may determine a sum of an animation time length of the first animation in the shot and an audio time length of the caption audio corresponding to the shot as a shot time length of the shot.

It will be appreciated that the alternative implementations described above may determine the duration of a shot based on the duration of the audio, thereby facilitating the playback of the subtitle audio to be more synchronized with the presentation of the subtitle text in the shot.

With further reference to fig. 2, fig. 2 is a flow chart of a second embodiment of the video generation method of the present disclosure. The process 200 of the video generation method includes:

and 201, acquiring the audio time length of the caption audio respectively matched with each shot in the shot sequence to be aligned.

In this embodiment, an execution subject (e.g., a server, a terminal device, etc.) of the video generation method may obtain, from other electronic devices or locally, audio durations of subtitle audio respectively matching respective shots in a sequence of shots to be aligned in a wired or wireless connection manner. During the playing process of the shots in the shot sequence, the shots present caption texts indicated by caption audios matched with the shots, and each shot in the shot sequence comprises an animation sequence.

In this embodiment, 201 is substantially the same as 101 in the corresponding embodiment of fig. 1, and is not described here again.

202, for each shot in the shot sequence, determining a difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot, and determining the time when the caption audio matched with the shot plays the time length indicated by the difference value as the animation starting time of the last animation included in the shot.

In this embodiment, for each shot in the shot sequence, the execution subject may determine a difference between an audio duration of the caption audio matching the shot and an animation duration of the last animation included in the shot, and determine a time at which the caption audio matching the shot plays the duration indicated by the difference as an animation start time of the last animation included in the shot.

In this embodiment, 202 is substantially the same as 102 in the corresponding embodiment of fig. 1, and is not described herein again.

And 203, determining the starting time of the shot of each shot in the shot sequence based on the total audio duration of the caption audio matched with the previous shot of the shot.

In this embodiment, the execution subject may determine, for each shot in the shot sequence, a shot start time of the shot based on the total audio duration of the caption audio matching a preceding shot of the shot.

In some optional implementations of the embodiment, the executing entity may determine the lens starting time of the lens in the following manner: and determining the end time of the total audio duration of the caption audio matched with the previous shot of the shot as the start time of the shot.

It can be understood that, in the above alternative implementation manner, when the subtitle audio matched with the preceding shot of the shot is played, the shot is presented to the user, so that the synchronism of the shot and the subtitle audio playing is ensured, and the picture in the shot can be presented to the user more timely.

Optionally, the executing entity may determine the shot start time of the shot in the following manner: and detecting a voice starting point of the caption audio matched with the shot, and taking the time indicated by the voice starting point as the shot starting time of the shot.

It can be understood that the above alternative implementation is more suitable for scenes with pauses between subtitle audios, and can determine the shot start time more accurately.

And 204, generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included by each shot in the shot sequence to be aligned, and the shot starting time of each shot in the shot sequence to be aligned.

In this embodiment, the execution body may generate the video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, the animation start time of the last animation included in each shot in the sequence of shots to be aligned, and the shot start time of each shot in the sequence of shots to be aligned.

Here, when playing the generated video, it is possible to start playing the last animation at the animation start time of the last animation and start playing each shot at the shot start time of each shot.

It should be noted that, besides the above-mentioned contents, the embodiment of the present application may further include the same or similar features and effects as the embodiment corresponding to fig. 1, and details are not repeated herein.

As can be seen from fig. 2, the process 200 of the video generation method in this embodiment may determine the shot start time of the shot based on the total audio duration of the subtitle audio matched with the preceding shot of the shot, so that the presentation of the shot type picture and the synchronization of the subtitle audio playing are realized based on the subtitle audio.

With continuing reference to fig. 3, fig. 3 is a flowchart of a third embodiment of the video generation method of the present disclosure. The process 300 of the video generation method includes:

301, obtaining the audio duration of the caption audio respectively matched with each shot in the shot sequence to be aligned.

In this embodiment, 301 is substantially the same as 101 in the corresponding embodiment of fig. 1, and is not described here again.

And 302, for each shot in the shot sequence, determining a difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot, and determining the time when the caption audio matched with the shot plays the time length indicated by the difference value as the animation starting time of the last animation included in the shot.

In this embodiment, 302 is substantially the same as 102 in the corresponding embodiment of fig. 1, and is not described herein again.

303, for each shot in the shot sequence, circularly playing the preset animation sequence in the shot when the last animation in the shot is played completely and the subtitle audio corresponding to the shot is not played completely.

In this embodiment, the execution subject may cyclically play the preset animation sequence in the shot for each shot in the shot sequence when the last animation in the shot is completely played and the subtitle audio corresponding to the shot is not completely played.

In some optional implementation manners of this embodiment, in the process of playing the preset animation sequence in a loop, adjacent animations in the preset animation sequence are played in a seamless manner.

The preset animation sequence may be an animation selected from the shot in advance. For example, the preset animation sequence can replace or delete other animations left after the animations entering and leaving the shot, so that the adjacent pictures can be connected naturally and smoothly no matter where the animations are cut.

And 304, generating a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included by each shot in the shot sequence to be aligned, and the playing mode of each shot in the shot sequence to be aligned.

In this embodiment, the execution subject may generate a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, the animation start time of the last animation included in each shot in the sequence of shots to be aligned, and the playing mode of each shot in the sequence of shots to be aligned.

Here, when playing the generated video, it is possible to start playing the last animation at the animation start time of the last animation, and play each shot in the playing manner of each shot.

As can be seen from fig. 3, in the flow 300 of the video generating method in this embodiment, the playing time of the shot with an unfixed duration can be extended by circularly playing the preset animation sequence in the shot, so that the audio duration of the subtitle audio matched with the shot is closer to the audio duration of the subtitle audio, and the playing of the two is more synchronous.

With continuing reference to fig. 4, fig. 4 is a flowchart of a fourth embodiment of the video generation method of the present disclosure. The flow 400 of the video generation method includes:

401, audio durations of subtitle audio respectively matched with the shots in the shot sequence to be aligned are obtained.

In this embodiment, 401 is substantially the same as 101 in the corresponding embodiment of fig. 1, and is not described here again.

402, for each shot in the shot sequence, determining a difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot, and determining the time when the caption audio matched with the shot plays the time length indicated by the difference value as the animation starting time of the last animation included in the shot.

In this embodiment, 402 is substantially the same as 102 in the corresponding embodiment of fig. 1, and is not described herein again.

And 403, acquiring the number of elements contained in the animation to be generated.

In this embodiment, the execution subject may obtain the number of elements included in the animation to be generated. During the process of playing the shot, a plurality of pictures can be sequentially presented in the shot, and each picture can include a plurality of elements. The pictures can be switched by animation. The element may be a character or an image, and for example, the element may be the subtitle text, a name of a company, a logo, or the like. The individual elements may also be displayed or hidden by animation.

From a predetermined set of templates, a template matching the number of elements is determined 404.

In this embodiment, the execution subject may determine a template matching the number of elements from a predetermined set of templates.

And each template in the template set is used for determining the position of each element in the animation to be generated. Each element number may be pre-matched to one or more templates in the template set. For example, when there is only one element, the template indicates that the element is centered; when there are two elements; the template indicates that the two elements are symmetrically arranged; when there are five elements, the template indicates that the element is displayed 2 elements on the screen, 3 elements on the screen, and so on.

The elements may be any predetermined text or image. As an example, in an application scenario where related information of a company is introduced, information of each shareholder in the company may be taken as one element. When the company has only one stockholder, if the template indicates that the element is centered, an animation can eventually be generated that the stockholder information is centered.

And 405, respectively importing each element in the animation to be generated into the determined template so as to generate one animation in the animation sequence.

In this embodiment, the executing entity may import each element in the animation to be generated into the determined template, so as to display each element at a corresponding position, thereby generating one animation in the animation sequence.

And 406, generating a video based on the generated animations, the subtitle audio matched with the shots in the shot sequence to be aligned and the animation starting time of the last animation included in the shots in the shot sequence to be aligned.

In this embodiment, the execution body may generate a video based on the animation start time of each animation generated, the subtitle audio matched with each shot in the shot sequence to be aligned, and the last animation included in each shot in the shot sequence to be aligned.

Here, when playing the generated video, the playing of the last animation may be started at the animation start time of the last animation, and the elements in the respective animations are presented at the positions indicated by the determined templates.

As can be seen from fig. 4, in the flow 400 of the video generation method in this embodiment, the template set is predetermined according to the number of different elements, so that when the automatically generated video includes shots with an indefinite number of elements, each element can be presented to the user according to the position indicated by the template.

In some optional implementation manners of this embodiment, in a case that there is no template matching with the number of elements in the predetermined template set, the execution main body may further sequentially present each element at a preset position according to a preset sequence in a process of playing an animation including the number of elements of the element number.

It can be understood that, in the case that no template matching the number of elements exists in the predetermined template set, the above alternative implementation may sequentially present each element at the preset position of the shot according to the set order, so that the above alternative implementation may be applicable to any number of elements.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a video generating apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which may include the same or corresponding features as the embodiment of the method shown in fig. 1 and produce the same or corresponding effects as the embodiment of the method shown in fig. 1, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the video generation apparatus 500 of the present embodiment includes: a first acquisition unit 501 and a first determination unit 502. The first obtaining unit 501 is configured to obtain audio durations of caption audios respectively matching respective shots in a sequence of shots to be aligned, where during playing of a shot in the sequence of shots, a caption text indicated by a caption audio matching the shot is presented in the shot, and each shot in the sequence of shots includes an animation sequence. The first determining unit 502 is configured to determine, for each shot in the shot sequence, a difference between an audio time length of the subtitle audio matching the shot and an animation time length of a last animation included in the shot, and determine a time at which the subtitle audio matching the shot plays a time length indicated by the difference as an animation start time of the last animation included in the shot. The generation unit 503 is configured to generate a video based on the sequence of shots to be aligned, the subtitle audio that matches each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned.

In this embodiment, the first acquisition unit 501 of the video generation apparatus 500 may acquire the audio durations of the caption audios respectively matching respective shots in a sequence of shots to be aligned, where during playback of a shot in the sequence of shots, a caption text indicated by the caption audio matching the shot is presented in the shot, and each shot in the sequence of shots includes an animation sequence.

In this embodiment, the first determining unit 502 may determine, for each shot in the shot sequence, a difference between an audio time length of the subtitle audio matching the shot and an animation time length of the last animation included in the shot, and determine a time at which the subtitle audio matching the shot plays for a time length indicated by the difference as an animation start time of the last animation included in the shot.

In this embodiment, the generation unit 503 may generate a video based on the sequence of shots to be aligned, the subtitle audio that matches each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned.

In some optional implementations of this embodiment, the generating unit 503 includes:

a first determining subunit (not shown in the figure) configured to determine, for each shot in the shot sequence, a start time of subtitle audio matching the shot as an animation start time of a first animation included in the shot;

a first generation subunit (not shown in the figure) configured to generate a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation start time of the last animation included in each shot in the shot sequence to be aligned, and the animation start time of the first animation.

In some optional implementations of this embodiment, the first generating subunit includes:

a first determining module (not shown in the figures) configured to determine a time interval between adjacent animations in the animation sequence included in the shot based on the number of animations in the animation sequence included in the shot;

and the generating module (not shown in the figure) is configured to generate the video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation starting time of the last animation included in each shot in the shot sequence to be aligned, the animation starting time of the first animation and the interval duration between adjacent animations in the animation sequence included in each shot.

a second determining subunit (not shown in the figure) configured to determine, for each shot in the shot sequence, a shot start time of the shot based on an audio total duration of subtitle audio that matches a preceding shot of the shot;

a second generation subunit (not shown in the figure) configured to generate a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, the animation start time of the last animation included in each shot in the sequence of shots to be aligned, and the shot start time of each shot in the sequence of shots to be aligned.

In some optional implementations of this embodiment, the second determining subunit includes:

and a second determining module (not shown in the figure) configured to determine an end time of the total audio duration of the subtitle audio matching a preceding shot of the shot as a shot start time of the shot.

In some optional implementations of this embodiment, the generating unit includes:

a third determining subunit (not shown in the figure) configured to determine, for each shot in the shot sequence, an audio start time of subtitle audio that matches a preceding shot of the shot based on a total shot duration of the shot;

a third generation subunit (not shown in the figure) configured to generate a video based on the sequence of shots to be aligned, the subtitle audio matching each shot in the sequence of shots to be aligned, the animation start time of the last animation included in each shot in the sequence of shots to be aligned, and the audio start time of the subtitle audio matching each shot in the sequence of shots to be aligned.

In some optional implementations of this embodiment, the third determining subunit includes:

and a third determining module (not shown in the figure) configured to determine the end time of the total shot duration of the preceding shot of the shot as the audio start time of the subtitle audio matching the shot.

a playing subunit (not shown in the figure), configured to set, for each shot in the shot sequence, a playing mode of the shot to be a preset animation sequence in the shot in a loop playing mode when the last animation in the shot is played completely and the subtitle audio corresponding to the shot is not played;

and a fourth generating subunit (not shown in the figure) configured to generate a video based on the shot sequence to be aligned, the subtitle audio matched with each shot in the shot sequence to be aligned, the animation start time of the last animation included in each shot in the shot sequence to be aligned, and the playing mode of each shot in the shot sequence to be aligned.

In some optional implementations of this embodiment, adjacent animations in the preset animation sequence are played seamlessly.

a fourth acquiring subunit (not shown in the figure) configured to acquire the number of elements included in the animation to be generated;

a fourth determining subunit (not shown in the figure), configured to determine templates that match the number of elements from a predetermined set of templates, wherein each template in the set of templates is used to determine the position of each element in the animation to be generated;

a fifth generating subunit (not shown in the figure) configured to import each element in the animation to be generated into the determined template, respectively, to generate the animation;

a sixth generating subunit (not shown in the figure) configured to generate a video based on the generated individual animations, the subtitle audio matching the individual shots in the sequence of shots to be aligned, and the animation start time of the last animation included in the individual shots in the sequence of shots to be aligned.

In some optional implementations of this embodiment, the apparatus 500 further includes:

a seventh determining unit (not shown in the figure) configured to determine a shot duration for a shot based on an audio duration of subtitle audio corresponding to the shot.

In some optional implementations of the embodiment, each shot in the sequence of shots to be aligned is used for presenting one or more of basic information of a company, business conditions, development conditions, risk conditions by text and/or images, and the audio of the subtitle matched with the shot is the audio of the text in the shot.

In the video generating apparatus provided by the above embodiment of the present disclosure, the first obtaining unit 501 may obtain the audio time lengths of the caption audios respectively matched with the respective shots in the shot sequence to be aligned, where during the playing process of the shots in the shot sequence, caption texts indicated by the caption audios matched with the shots are present in the shots, each shot in the shot sequence includes an animation sequence, and then the first determining unit 502 may determine, for each shot in the shot sequence, a difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot, determine the time length of the caption audio matched with the shot as the animation start time of the last animation included in the shot, and finally, the generating unit 503 may determine the audio time length of the caption audio matched with the shot as the animation start time length of the last animation included in the shot based on the shot sequence to be aligned, the first obtaining unit 501 may obtain the audio time, And generating videos by the subtitle audio matched with each shot in the shot sequence to be aligned and the animation starting time of the last animation included by each shot in the shot sequence to be aligned. Therefore, the difference value between the audio time length of the caption audio matched with the shot and the animation time length of the last animation included in the shot is determined, and the time when the caption audio matched with the shot plays the time length indicated by the difference value is determined as the animation starting time of the last animation of the shot, so that the last animation of the shot of the generated video and the caption audio matched with the shot can be ensured to be synchronously played, and the synchronism of the picture and the audio in the video playing process is improved. The problems of video such as too fast, too slow and blocking caused by stretching and compressing are solved.

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 6. The electronic device may be either or both of the first device and the second device, or a stand-alone device separate from them, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom.

FIG. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

As shown in fig. 6, the electronic device 6 includes one or more processors 601 and memory 602.

The processor 601 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

Memory 602 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 601 to implement the video generation methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device may further include: an input device 603 and an output device 604, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic device is a first device or a second device, the input device 603 may be the microphone or the microphone array described above for capturing the input signal of the sound source. When the electronic device is a stand-alone device, the input means 603 may be a communication network connector for receiving the acquired input signals from the first device and the second device.

The input device 603 may also include, for example, a keyboard, a mouse, and the like. The output device 604 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 604 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 6, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device may include any other suitable components, depending on the particular application.

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the video generation method according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a video generation method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of video generation, the method comprising:

acquiring audio time of caption audio matched with each shot in a shot sequence to be aligned respectively, wherein in the playing process of the shots in the shot sequence, caption texts indicated by the caption audio matched with the shots are presented in the shots, each shot in the shot sequence comprises an animation sequence, and each animation sequence comprises at least one animation;

2. The method according to claim 1, wherein the generating a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned comprises:

3. The method according to claim 2, wherein the generating a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, the animation start time of the last animation included in each shot in the sequence of shots to be aligned, and the animation start time of the first animation comprises:

4. The method according to any one of claims 1 to 3, wherein the generating a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned comprises:

5. The method of claim 4, wherein determining the start of shot time for the shot based on the total audio duration of the subtitle audio that matches a preceding shot of the shot comprises:

6. The method according to any one of claims 1 to 3, wherein the generating a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned comprises:

7. The method of claim 6, wherein determining the audio start time of the caption audio matching the shot based on the total shot duration of the preceding shot of the shot comprises:

8. The method according to any one of claims 1 to 7, wherein the generating a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned comprises:

setting the playing mode of each shot in the shot sequence as a preset animation sequence in the shot in a circulating way under the condition that the last animation in the shot is played completely and the caption audio corresponding to the shot is not played completely;

9. The method of claim 8, wherein adjacent animations in the preset animation sequence are played seamlessly.

10. The method according to any one of claims 1 to 9, wherein the generating a video based on the sequence of shots to be aligned, the subtitle audio matched with each shot in the sequence of shots to be aligned, and the animation start time of the last animation included in each shot in the sequence of shots to be aligned comprises:

acquiring the number of elements contained in the animation to be generated;

11. The method according to one of claims 1 to 10, further comprising:

12. The method according to one of claims 1 to 11, wherein each shot in the sequence of shots to be aligned is used for presenting one or more of basic company information, business situation, development situation, risk situation by text and/or image, and the subtitle audio matched with a shot is the audio of the text in the shot.

13. A video generation apparatus, characterized in that the apparatus comprises:

a first obtaining unit configured to obtain audio durations of caption audios respectively matched with respective shots in a shot sequence to be aligned, wherein during playing of the shots in the shot sequence, caption texts indicated by the caption audios matched with the shots are presented in the shots, each shot in the shot sequence comprises an animation sequence, and each animation sequence comprises at least one animation;

14. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing a computer program stored in the memory, and when executed, implementing the method of any of the preceding claims 1-12.

15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 12.

16. A computer program comprising computer readable code for, when run on a device, executing instructions for implementing the steps of the method according to any one of claims 1 to 12 by a processor in the device.