CN114979764B

CN114979764B - Video generation method, device, computer equipment and storage medium

Info

Publication number: CN114979764B
Application number: CN202210441414.2A
Authority: CN
Inventors: 徐娟
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2024-02-06
Anticipated expiration: 2042-04-25
Also published as: CN114979764A

Abstract

The application relates to the field of video generation, and provides a video generation method, a device, equipment and a computer storage medium, wherein the method comprises the following steps: acquiring audio and video information recorded according to preset text information, wherein the text information comprises at least one text sub-information; acquiring a first animation sub-mirror corresponding to each text sub-information; segmenting the audio and video information according to each text sub-information to obtain at least one section of audio and video sub-information; according to the audio and video sub-information, adjusting each first animation sub-mirror to obtain a second animation sub-mirror; and synthesizing the audio and video sub-information and the second animation sub-mirrors to generate a target video. According to the audio and video information recorded by the user, high-quality video with animation effect can be quickly generated, and the video production period is shortened. The application also relates to artificial intelligence, and the video generation method can be applied to cloud servers of big data and artificial intelligence platform cloud computing services.

Description

Video generation method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of video generation, and in particular, to a video generation method, apparatus, computer device, and storage medium.

Background

With the popularity of short videos, more and more teams or individuals choose to advertise by capturing short videos. However, in the prior art, a short video scene shot by a person is single, the content is simple, and the video quality is poor; the short video produced by the team not only needs to produce special post special effects such as animation, but also needs to correspondingly match the animation with video content, duration and frame rate, the production cost of the video is high, the production period is long, and a method for producing the video with animation effect with high quality and low cost is needed.

Disclosure of Invention

The main purpose of the present application is to provide a video generation method, apparatus, device and computer storage medium, which aims to quickly generate high quality video with animation effect and shorten the production period of video.

In a first aspect, the present application provides a video generating method, including the steps of:

acquiring audio and video information recorded according to preset text information, wherein the text information comprises at least one text sub-information;

acquiring a first animation sub-mirror corresponding to each text sub-information;

segmenting the audio and video information according to each text sub-information to obtain at least one section of audio and video sub-information;

according to the audio and video sub-information, adjusting each first animation sub-mirror to obtain a second animation sub-mirror;

and synthesizing the audio and video sub-information and the second animation sub-mirrors to generate a target video.

In a second aspect, the present application further provides a video generating apparatus, including:

the first acquisition module is used for acquiring audio and video information recorded according to preset text information, wherein the text information comprises at least one text sub-information;

the second acquisition module is used for acquiring the first animation sub-mirrors corresponding to the text sub-information;

the audio and video segmentation module is used for segmenting the audio and video information according to each text sub-information to obtain at least one section of audio and video sub-information;

the animation adjustment module is used for adjusting each first animation sub-mirror according to each audio and video sub-information to obtain a second animation sub-mirror;

and the video synthesis module is used for synthesizing each piece of audio and video sub-information with each piece of second animation sub-mirror to generate a target video.

In a third aspect, the present application also provides a computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements a video generation method as described above.

In a fourth aspect, the present application further provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a video generation method as described above.

The application provides a video generation method, a device, equipment and a computer storage medium, wherein the application acquires audio and video information recorded according to preset text information, wherein the text information comprises at least one text sub-information; acquiring a first animation sub-mirror corresponding to each text sub-information; segmenting the audio and video information according to each text sub-information to obtain at least one section of audio and video sub-information; according to the audio and video sub-information, adjusting each first animation sub-mirror to obtain a second animation sub-mirror; and synthesizing the audio and video sub-information and the second animation sub-mirrors to generate a target video. Because the audio and video information recorded by the user is obtained and the pre-generated animation sub-mirrors are adjusted according to the audio and video information to generate the target video, the user can automatically generate the high-quality video with the animation by only recording the audio and video information, and the manufacturing period of the video is shortened.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a video generating method according to an embodiment of the present application;

FIG. 2 is a view of a video generating method according to an embodiment of the present application;

FIG. 3 is a schematic block diagram of a video generating apparatus according to an embodiment of the present application;

fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

The embodiment of the application provides a video generation method, a video generation device, computer equipment and a computer readable storage medium.

Referring to fig. 1, fig. 1 is a flowchart of a video generating method according to an embodiment of the present application. The video generation method can be used in a terminal or a server to realize the target video according to the audio and video information input by a user. The terminal can be electronic equipment such as a mobile phone, a tablet personal computer, a notebook computer, a desktop computer, a personal digital assistant, wearable equipment and the like; the server may be an independent server, a server cluster, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligence platforms, and the like.

Referring to fig. 2, fig. 2 is a usage scenario diagram provided in an embodiment of the present application. As shown in fig. 2, preset text information is stored in the server, and the user records audio and video information according to the text information, and the server can match the audio and video information with a preset animation mirror to synthesize a target video.

As shown in fig. 1, the video generating method includes steps S101 to S105.

Step S101, acquiring audio and video information recorded according to preset text information, wherein the text information comprises at least one text sub-information.

For example, text information is preset based on the actual requirement of information transmission, for example, a single text in a target video is preset, and the text information is displayed to a user, so that the user records audio and video information according to the text information.

The audio/video information is an audio/video of the text information read by the user, and the audio/video information may be a video with an audio track or may be audio only, which is not limited herein.

For example, the text information may be divided into at least one text sub-information according to actual requirements, for example, according to punctuation marks, line feed symbols, and the like in the text information, which is not limited herein.

For example, each text sub-information may be displayed one by one to the user when the audio and video information is recorded, for example, sentence by sentence or line by line, so that a pause is left between each text sub-information in the audio and video information recorded by the user, so that the audio and video information is more accurately segmented into the audio and video sub-information.

Step S102, obtaining a first animation sub-mirror corresponding to each piece of text sub-information.

The first animation sub-mirrors are preset animation information according to the text sub-information, for example, material elements are obtained from a preset material library according to the text sub-information, a plurality of animation video frames are generated based on the video elements, each first animation sub-mirror is synthesized based on the animation video frames, and the template ID of the text information corresponding to the first animation sub-mirrors and the text codes corresponding to the corresponding text sub-information are stored.

In some embodiments, step S102 includes obtaining a first animation mirror corresponding to each text sub-information, where the first animation mirror includes: and acquiring the first animation sub-mirrors corresponding to the text sub-information according to the preset corresponding relation between the first animation sub-mirrors and the text sub-information.

The first animation sub-mirror stores a text code corresponding to the text sub-information according to a template ID of the corresponding text information, and the text code may be, for example, a line number corresponding to the text sub-information; therefore, the first animation sub-mirrors corresponding to the text sub-information can be obtained through the corresponding relation between the template id+text code and the first animation sub-mirrors, which is not limited to this.

Illustratively, according to the template ID of the text information and the text codes corresponding to the corresponding text sub-information, each text sub-information corresponds to each first animation sub-mirror, so that the first animation sub-mirrors matched with the text sub-information are conveniently obtained.

And step 103, segmenting the audio and video information according to each text sub-information to obtain at least one section of audio and video sub-information.

The audio and video information is divided into audio and video sub-information corresponding to the text sub-information, so that each of the audio and video sub-information corresponds to each of the first animation sub-mirrors.

In some embodiments, step S103 includes slicing the audio and video information according to each text sub-information to obtain at least one piece of audio and video sub-information, including: extracting the audio and video information based on a preset voice endpoint detection algorithm to obtain effective audio and video information; based on a preset voice text matching algorithm, matching the effective audio and video information with the text sub-information; and obtaining the audio and video sub-information according to the matching result of the effective audio and video information and the text sub-information.

The voice endpoint detection algorithm may be, for example, a voice activity detection algorithm (Voice Activity Detection, VAD for short), which is capable of detecting whether there is voice in the audio and removing non-voice fragments, simplifying voice processing. Of course, the voice endpoint detection may be performed by other algorithms, which are not limited herein.

Illustratively, recognizing and eliminating pauses when the audio and video information user reads the text information through a voice endpoint detection algorithm to obtain effective audio and video information in the audio and video information.

The method includes the steps of matching the effective audio and video information with the text sub-information based on the voice text matching algorithm, determining a starting point and an ending point of each audio and video sub-information in the audio and video information, and segmenting the effective audio and video information according to the starting point and the ending point of each audio and video sub-information to obtain audio and video sub-information corresponding to the text sub-information, for example, an audio and video fragment of the text sub-information is read by a user.

By means of voice endpoint detection, pauses when a user reads text information can be removed, and interference of human factors on generated videos is reduced; and at least one section of audio and video sub-information corresponding to the text sub-information is obtained through a voice text matching algorithm, so that the animation sub-mirror can be conveniently adjusted subsequently, and the visibility of the generated target video is improved.

And step S104, according to the audio and video sub-information, adjusting the first animation sub-mirrors to obtain second animation sub-mirrors.

For example, when a user records audio and video information, it is difficult to ensure that the time length of reading each text sub-information is consistent with the time length of the corresponding first animation sub-mirror, the audio and video information needs to be divided into the audio and video sub-information, and then the corresponding first animation sub-mirror is adjusted according to the audio and video sub-information to obtain the second animation sub-mirror.

In some embodiments, step S104 adjusts each of the first animation sub-mirrors according to each of the audio/video sub-information to obtain a second animation sub-mirror, including: if the length of the audio and video sub-information is larger than the length of the corresponding first animation sub-mirror, inserting frames into the first animation sub-mirror to obtain a second animation sub-mirror; and if the length of the audio and video sub-information is smaller than the length of the corresponding first animation sub-mirror, performing frame extraction on the first animation sub-mirror to obtain a second animation sub-mirror.

If the length of the audio/video sub-information is greater than the corresponding length of the first animation sub-mirror, the length of the first animation sub-mirror is extended by inserting frames into the first animation sub-mirror, so as to generate a second animation sub-mirror corresponding to the duration of the audio/video sub-information.

Specifically, in order to match the length of the animation sub-mirror with the corresponding audio/video sub-information, for example, the first animation sub-mirror may be subjected to frame interpolation by a Real-time intermediate stream estimation algorithm (Real-time Intermediate Flow Estimation, abbreviated as RIFE) to obtain the second animation sub-mirror, which is not limited to this, and the frame interpolation may be performed by other algorithms, which is not limited herein.

For example, if the length of the audio/video sub-information is smaller than the length of the corresponding first animation sub-mirror, the length of the first animation sub-mirror is shortened by performing frame extraction on the first animation sub-mirror, and a second animation sub-mirror corresponding to the duration of the audio/video sub-information is generated.

Specifically, in order to match the length of the animation sub-mirror with the corresponding audio/video sub-information, for example, the first animation sub-mirror may be subjected to frame extraction at equal intervals, and a plurality of frames may be extracted from the first animation sub-mirror at fixed intervals, which is not limited to this.

It can be understood that if the audio and video sub-information is the same as the corresponding first animation sub-mirror time length, the frame rate of the animation sub-mirror does not need to be adjusted.

By inserting or extracting frames from the first animation sub-mirror, the second animation sub-mirror and the corresponding audio/video sub-information have the same duration, so that the smoothness and the visibility of generating the target video are improved.

Step S105, synthesizing each of the audio and video sub-information and each of the second animation sub-mirrors to generate a target video.

In some embodiments, step S105 synthesizes each of the audio and video sub-information with each of the second animation sub-mirrors to generate a target video, including: synthesizing the audio and video sub-information and each second animation sub-mirror to generate a target sub-video; and splicing the target sub-videos to obtain the target video.

The audio and video sub-information and the corresponding second animation sub-mirrors are combined to obtain at least one section of target sub-video, and the target sub-videos are spliced according to the sequence to obtain the target video.

In some embodiments, step S105 synthesizes each of the audio and video sub-information with each of the second animation sub-mirrors to generate a target video, and further includes: and generating the target video according to a preset mirror-out mode, wherein the mirror-out mode comprises a whole-course mirror-out mode, a partial mirror-out mode and a mirror-out-free mode.

The mirror-out mode may be determined according to audio and video information recorded by a user, for example, if the audio and video information is a video with an audio track, the mirror-out mode may be a full-course mirror-out mode or a partial mirror-out mode; and if the audio and video information is audio information, the mirror-out mode is a mirror-out-mode.

For example, the mirror mode may be determined based on a selection operation of the user, for example, if the audio and video information is a video with an audio track, the mirror mode is determined to be a full-course mirror mode, a partial mirror mode or a no-mirror mode according to a selection operation of the user for the mirror mode.

Specifically, if the mirror-out mode is a full-range mirror-out mode, the video information in the audio/video information may be disposed below each of the second moving image sub-mirrors, which is not limited to this, and is not limited to this.

Specifically, if the mirror-out mode is a partial mirror-out mode, for example, a first sentence mirror-out mode may be used, in which the first target sub-video uses audio-video sub-information recorded by the user, and the subsequent target sub-video uses audio-information in each of the audio-video sub-information recorded by the user and each of the second animation sub-mirrors, which is not limited to this.

By setting a plurality of mirror-out modes, the flexibility of generating target videos is improved, users can select to record audio or videos with audio tracks according to actual requirements, and user experience is improved.

In some embodiments, the video generation method further comprises: based on a preset voice recognition algorithm, adjusting each text sub-information according to each audio and video sub-information; generating subtitles corresponding to the audio and video sub-information according to the adjusted text sub-information; step S105 further includes: and synthesizing each piece of audio and video sub-information, each piece of second animation sub-mirror and the caption.

The text sub-information is modified according to the audio-video sub-information based on the voice recognition algorithm, so that the modified text sub-information corresponds to the content of the audio-video sub-information, and when the target video is synthesized, the subtitle generated according to the modified text sub-information is synthesized with the audio-video sub-information and the second animation sub-mirrors to obtain the target video with the subtitle.

In some embodiments, step S105 further comprises: and synthesizing the audio and video sub-information, the second animation sub-mirrors and preset sound effect information to generate a target video.

By way of example, according to the requirement of synthesizing video, the audio information is obtained from a preset material library, and the audio information is synthesized into the target video, so that the target video is more vivid and visual, and the viewing experience of a user is improved.

According to the video generation method provided by the embodiment, the audio and video information recorded according to the preset text information is obtained, wherein the text information comprises at least one text sub-information; acquiring a first animation sub-mirror corresponding to each text sub-information; segmenting the audio and video information according to each text sub-information to obtain at least one section of audio and video sub-information; according to the audio and video sub-information, adjusting each first animation sub-mirror to obtain a second animation sub-mirror; and synthesizing the audio and video sub-information and the second animation sub-mirrors to generate a target video. According to the audio and video information recorded by the user, high-quality video with animation effect can be quickly generated, and the video production period is shortened.

Referring to fig. 3, fig. 3 is a schematic diagram of a video generating apparatus according to an embodiment of the present application, where the video generating apparatus may be configured in a server or a terminal, for executing the video generating method described above.

As shown in fig. 3, the video generating apparatus includes:

the first obtaining module 110 is configured to obtain audio and video information recorded according to preset text information, where the text information includes at least one text sub-information.

The second obtaining module 120 is configured to obtain a first animation sub-mirror corresponding to each text sub-information.

And the audio and video segmentation module 130 is configured to segment the audio and video information according to each text sub-information to obtain at least one piece of audio and video sub-information.

And the animation adjustment module 140 is configured to adjust each of the first animation sub-mirrors according to each of the audio/video sub-information, so as to obtain a second animation sub-mirror.

And the video synthesis module 150 is configured to synthesize each of the audio and video sub-information and each of the second animation sub-mirrors to generate a target video.

Illustratively, the animation adjustment module 140 includes: the device comprises an animation frame inserting module and an animation frame extracting module.

The animation frame inserting module is used for inserting frames to the first animation sub-mirrors to obtain second animation sub-mirrors if the length of the audio and video sub-information is larger than the length of the corresponding first animation sub-mirrors;

and the animation frame extraction module is used for extracting frames from the first animation sub-mirrors to obtain second animation sub-mirrors if the length of the audio/video sub-information is smaller than the length of the corresponding first animation sub-mirrors.

Illustratively, the audio/video slicing module 130 includes: the device comprises an effective information extraction module, a voice text matching module and a matching result processing module.

The effective information extraction module is used for extracting the audio and video information based on a preset voice endpoint detection algorithm to obtain effective audio and video information;

the voice text matching module is used for matching the effective audio and video information with the text sub-information based on a preset voice text matching algorithm;

and the matching result processing module is used for obtaining the audio and video sub-information according to the matching result of the effective audio and video information and the text sub-information.

Illustratively, the video generating apparatus further includes: a text adjustment module and a caption generation module.

The text adjustment module is used for adjusting each text sub-information according to each audio/video sub-information based on a preset voice recognition algorithm;

the subtitle generating module is used for generating subtitles corresponding to the audio and video sub-information according to the adjusted text sub-information;

illustratively, the video composition module 150 includes a subtitle composition module.

And the subtitle synthesis module is used for synthesizing the audio and video sub-information, the second animation sub-mirrors and the subtitles.

Illustratively, the second acquisition module 120 includes a second acquisition sub-module.

The second acquisition sub-module is used for acquiring the first animation sub-mirrors corresponding to the text sub-information according to the preset corresponding relation between the first animation sub-mirrors and the text sub-information.

The video composition module 150 further includes a sub-video composition module and a sub-video stitching module.

The sub-video synthesis module is used for synthesizing the audio and video sub-information and each second animation sub-mirror to generate a target sub-video;

and the sub-video splicing module is used for splicing the target sub-videos to obtain the target videos.

Illustratively, the video compositing module 150 further includes a mirror mode determination module.

And the mirror-out mode determining module is used for generating the target video according to a preset mirror-out mode, wherein the mirror-out mode comprises a whole-course mirror-out mode, a partial mirror-out mode and a mirror-out-free mode.

It should be noted that, for convenience and brevity of description, specific working processes of the above-described apparatus and each module, unit may refer to corresponding processes in the foregoing method embodiments, which are not repeated herein.

The methods and apparatus of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The above-described methods, apparatus may be implemented, for example, in the form of a computer program that is executable on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server or a terminal.

As shown in fig. 4, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a storage medium and an internal memory.

The storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause a processor to perform any of a number of video generation methods.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a storage medium that, when executed by a processor, causes the processor to perform any one of the video generation methods.

The network interface is used for network communication such as transmitting assigned tasks and the like. Those skilled in the art will appreciate that the structures shown in FIG. 4 are block diagrams only and do not constitute a limitation of the computer device on which the present aspects apply, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

In one embodiment, the processor is configured to, when implementing adjustment of each of the first animation sub-mirrors according to each of the audio/video sub-information to obtain a second animation sub-mirror, implement:

if the length of the audio and video sub-information is larger than the length of the corresponding first animation sub-mirror, inserting frames into the first animation sub-mirror to obtain a second animation sub-mirror;

and if the length of the audio and video sub-information is smaller than the length of the corresponding first animation sub-mirror, performing frame extraction on the first animation sub-mirror to obtain a second animation sub-mirror.

In one embodiment, the processor is configured to, when implementing splitting the audio and video information according to each text sub-information to obtain at least one piece of audio and video sub-information, implement:

extracting the audio and video information based on a preset voice endpoint detection algorithm to obtain effective audio and video information;

based on a preset voice text matching algorithm, matching the effective audio and video information with the text sub-information;

and obtaining the audio and video sub-information according to the matching result of the effective audio and video information and the text sub-information.

In one embodiment, the processor is configured to, after implementing the segmentation of the audio and video information according to each text sub-information, obtain at least one piece of audio and video sub-information, implement:

based on a preset voice recognition algorithm, adjusting each text sub-information according to each audio and video sub-information;

and generating subtitles corresponding to the audio and video sub-information according to the adjusted text sub-information.

In one embodiment, the processor is configured to, when implementing the synthesizing of each of the audio and video sub-information and each of the second animation sub-mirrors to generate the target video, implement:

and synthesizing each piece of audio and video sub-information, each piece of second animation sub-mirror and the caption.

In one embodiment, the processor is configured to, when implementing obtaining a first animation sub-mirror corresponding to each text sub-information, implement:

and acquiring the first animation sub-mirrors corresponding to the text sub-information according to the preset corresponding relation between the first animation sub-mirrors and the text sub-information.

synthesizing the audio and video sub-information and each second animation sub-mirror to generate a target sub-video;

and splicing the target sub-videos to obtain the target video.

and generating the target video according to a preset mirror-out mode, wherein the mirror-out mode comprises a whole-course mirror-out mode, a partial mirror-out mode and a mirror-out-free mode.

It should be noted that, for convenience and brevity of description, the specific working process of video generation described above may refer to the corresponding process in the foregoing embodiment of the video generation control method, which is not described herein again.

Embodiments of the present application also provide a computer readable storage medium, where a computer program is stored, where the computer program includes program instructions, and a method implemented when the program instructions are executed may refer to various embodiments of the video generating method of the present application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments. While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of video generation, the method comprising:

synthesizing each piece of audio and video sub-information with each piece of second animation sub-mirror to generate a target video;

the method for adjusting the first animation sub-mirrors to obtain second animation sub-mirrors according to the audio and video sub-information comprises the following steps:

2. The method of claim 1, wherein the splitting the audio and video information according to each text sub-information to obtain at least one piece of audio and video sub-information includes:

3. The method of claim 1, further comprising, after the segmenting the audio-video information according to each text sub-information to obtain at least one piece of audio-video sub-information:

generating subtitles corresponding to the audio and video sub-information according to the adjusted text sub-information;

the synthesizing the audio and video sub-information and the second animation sub-mirrors to generate a target video includes:

4. A video generating method according to any one of claims 1 to 3, wherein said obtaining a first animation sub-mirror corresponding to each text sub-information comprises:

5. A video generation method according to any one of claims 1 to 3, wherein the synthesizing each of the audio-video sub-information and each of the second animation sub-mirrors to generate a target video includes:

and splicing the target sub-videos to obtain the target video.

6. A video generation method according to any one of claims 1 to 3, wherein the synthesizing each of the audio-video sub-information and each of the second animation sub-mirrors to generate a target video includes:

7. A video generating apparatus, characterized in that the video generating apparatus comprises:

the video synthesis module is used for synthesizing each piece of audio and video sub-information with each piece of second animation sub-mirror to generate a target video;

wherein, audio and video information is the video that has the audio track, the animation adjustment module includes: the animation frame inserting module and the animation frame extracting module are used for inserting the animation frames;

8. A computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the video generation method of any of claims 1 to 6.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the video generation method according to any of claims 1 to 6.