CN113438428A

CN113438428A - Method, apparatus, device and computer-readable storage medium for automated video generation

Info

Publication number: CN113438428A
Application number: CN202110699963.5A
Authority: CN
Inventors: 卞东海; 郑烨翰; 彭卫华; 徐伟建
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2021-09-24
Anticipated expiration: 2041-06-23
Also published as: CN113438428B

Abstract

The present disclosure discloses a method, an apparatus, a device and a computer-readable storage medium for video automatic generation, and relates to the field of computer technology, in particular to the field of knowledge graph. The specific implementation scheme is as follows: acquiring characteristic information of a first template, wherein the first template comprises a plurality of video elements; identifying a plurality of padding bits of the first template based on the feature information, wherein in each of the plurality of video elements, the first content information corresponding to each of the plurality of padding bits is replaceable; replacing the first content information of at least one of the plurality of padding bits with the second content information to obtain a second template; and generating a video using the second template. By the method, the manual participation in the video generation period is reduced, and the automatic video generation efficiency is improved.

Description

Method, apparatus, device and computer-readable storage medium for automated video generation

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for video automated generation in the field of information flow.

Background

With the development of internet technology and multimedia technology, video is becoming an important source for people to obtain information as the most popular creation mode at present. Therefore, in the face of increasing video demands, higher demands are placed on the speed of video generation.

At present, more and more research and development are focused on using machine equipment to automatically generate videos so as to replace manual operations in the process of making the videos, and further, the video generation efficiency is improved. However, the technology of using machine equipment to realize automatic video generation is in the preliminary stage, and there are many problems to be solved.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and computer-readable storage medium for automated generation of video.

According to a first aspect of the present disclosure, a method for automated video generation is provided. The method comprises the following steps: acquiring characteristic information of a first template, wherein the first template comprises a plurality of video elements; identifying a plurality of padding bits of the first template based on the feature information, wherein in each of the plurality of video elements, the first content information corresponding to each of the plurality of padding bits is replaceable; replacing the first content information of at least one of the plurality of padding bits with the second content information to obtain a second template; and generating a video using the second template.

According to a second aspect of the present disclosure, an apparatus for automated video generation is provided. The device includes: a feature information acquisition module configured to acquire feature information of a first template, wherein the first template comprises a plurality of video elements; a padding bit identification module configured to identify a plurality of padding bits of the first template based on the feature information, wherein in each of the plurality of video elements, the first content information corresponding to each of the plurality of padding bits is replaceable; a content information replacement module configured to replace first content information of at least one of the plurality of padding bits with second content information to obtain a second template; and a video generation module configured to generate a video using the second template.

According to a third aspect of the present disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to the first aspect of the disclosure.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method according to the first aspect of the present disclosure.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the first aspect of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 illustrates a schematic diagram of an environment 100 in which embodiments of the present disclosure can be implemented;

fig. 2 illustrates a flow diagram of a method 200 for automated generation of video, according to some embodiments of the present disclosure;

fig. 3 illustrates a flow diagram of a method 300 for automated generation of video, in accordance with some embodiments of the present disclosure;

fig. 4 illustrates a block diagram of an apparatus 400 for automated video generation according to some embodiments of the present disclosure; and

fig. 5 illustrates a block diagram of an electronic device 500 capable of implementing multiple embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In describing embodiments of the present disclosure, the terms "include" and its derivatives should be interpreted as being inclusive, i.e., "including but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

At present, research and development on video automatic generation are still in a preliminary stage, and a common method mostly depends on development of a video generation template by a developer, and then filling content information and/or designing special effects on videos by using the developed template. These methods all require a lot of manual involvement in the automated generation of video. Therefore, there is a problem of inefficiency in video automatic generation.

In order to solve at least the above problems, according to an embodiment of the present disclosure, an improved scheme for automated video generation is proposed. In this approach, feature information of a first template is obtained, for example, in a video design platform. Then, a plurality of padding bits of the first template are identified based on the characteristic information. Next, the first content information corresponding to at least one of the plurality of padding bits is replaced with the second content information to obtain a second template. In the video design platform, a video may be generated using the second template. By the method, the video generation period, particularly the manual participation in the template making process is greatly reduced, and the video automatic generation efficiency is improved.

Fig. 1 illustrates a schematic diagram of an environment 100 in which various embodiments of the present disclosure can be implemented. The example environment 100 includes a video design platform 101.

The video design platform 101 (which may also be referred to as a computing device) may adjust templates from material from different sources (1021 through 102n, where n is any positive integer) and automatically generate a video 103 using the adjusted templates. Example video design platforms 101 include, but are not limited to, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, Personal Digital Assistants (PDAs), media players, and the like), multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

The content information 102 (e.g., material 1021 through material 102n) describes at least content required for generating the video 103, such as pictures, characters, animation, sound, and the like. The above examples are intended to be illustrative of the present disclosure only, and not to be limiting of the present disclosure, as those skilled in the art will be able to select materials as desired.

The video design platform 101 adjusts the template based on the received material (1021 to 102n), and automatically generates the video 103 using the adjusted template.

By the method, the manual participation in the video generation period is greatly reduced, so that the automatic video generation efficiency is improved, and the video generation cost is reduced.

Fig. 1 above illustrates a schematic diagram of an environment 100 in which various embodiments of the present disclosure can be implemented. A flow diagram of a method 200 for automated generation of video according to some embodiments of the present disclosure is described below in conjunction with fig. 2. Method 200 in fig. 2 is performed by video design platform 101 in fig. 1 or any suitable video design platform.

At block 202, feature information of a first template is obtained, the first template including a plurality of video elements (e.g., video element 1 through video element n, where n is any positive integer). In one example, the feature information of the first template includes machine-recognizable structural and content features corresponding to each of the plurality of video elements and dependency features between each of the plurality of video elements. For example, structural features may be identified using a tree structure, and content features may include start times, durations, sizes, locations, etc. of the various materials contained by each of the plurality of video elements, and associated special effects. In the present disclosure, a video element may be, for example, a virtual unit corresponding to one video frame or a plurality of video frames in a video stream. The template may be combined with content information and processed in units of video elements.

In some embodiments, the video design platform 101 obtains the first template. The first template may be, for example, a template designed by ae (adobe After effects) software. Next, the video design platform 101 parses the obtained first template to obtain machine-recognizable structural features and content features corresponding to each of the plurality of video elements. The video design platform 101 also constructs dependencies between each of the plurality of video elements.

In some embodiments, the video design platform 101 may construct the dependency between each of the plurality of video elements by one or more of: a name of a first dependent video element of the plurality of video elements on which the start time of each video element depends; the start time of each video element depends on the determination of the start time or end time of the first dependent video element; a name of a second dependent video element of the plurality of video elements on which the duration of each video element depends; a determination of whether each video element can be replaced; and the modality of each video element (e.g., text, picture, sound, video, etc.).

For example, video design platform 101 may define that video element 2 appears when video element 1 and video element 3 are present at the same time. The video design platform 101 may also define that video element 2 occurs at the end time of video element 1. The video design platform 101 may also define other video elements (e.g., video element 4 and video element 5) on which the duration of video element 2 depends. The video design platform 101 may also define that the video element 6 can be replaced. The video design platform 101 may also define the modality of the video element 5 as a picture. The above examples are intended to be illustrative of the present disclosure, and are not intended to be limiting of the present disclosure.

At block 204, a plurality of padding bits (e.g., padding bits 1 through n, where n is any positive integer) of the first template are identified based on the feature information, wherein in each of the plurality of video elements, the first content information corresponding to each of the plurality of padding bits is capable of being replaced. In one example, the video design platform 101 identifies video elements 5 through n as padding bits from video element 1 through n, where the first content information (e.g., text-type content information, picture-type content information, sound-type content information, and video-type content information) contained by the video elements 5 through n can be replaced. The above examples are intended to be illustrative of the present disclosure, and are not intended to be limiting of the present disclosure.

In some embodiments, the text-type content information includes content information such as start time, duration, etc. of the text. The picture content information includes content information such as the size, start time, duration, etc. of the picture. The sound-like content information includes content information such as the start time and duration of sound. The video content information includes content information such as start time, duration, size, frame rate, etc. of the video.

The first template may be stored after the feature information is obtained and the fill bit identification is performed so that the video design platform 101 may subsequently use the first template to automatically generate any video further an unlimited number of times, as will be further described below.

At block 206, the first content information corresponding to at least one of the plurality of padding bits is replaced with the second content information to obtain a second template. In one example, the video design platform 101 receives content information 102 (e.g., material 1021 through material 102n from different sources) and replaces the first content information corresponding to fill bit 1 (e.g., in video element 1) with the material 1021 or replaces the first content information corresponding to fill bit 3 (e.g., in video element 3) with the material 1023. Other material similarly replaces the first content information contained in the video element corresponding to the padding bits in the first template, which is not illustrated here. The video design platform 101 adaptively adjusts the first template after completing the replacement to generate an updated template (second template).

At block 208, a video is generated using the second template. For example, the video design platform 101 further adjusts the composition elements used to generate the video (e.g., Comp in AE software) and executes rendering commands (e.g., using the AE software as the underlying rendering engine) according to the second template to generate the video 103.

According to the method disclosed by the embodiment of the disclosure, after the operation of acquiring all the feature information and identifying the filling bits for the first template is completed once, the first template can be adjusted to generate a new template according to the newly input content information for an unlimited number of times, and a new video is automatically generated by using the new template, so that the manual participation during the video generation can be greatly reduced, the efficiency of the automatic generation of the video is improved, and the cost of the video generation is reduced.

In some embodiments, the steps implemented by the above block 206 may be implemented by the following embodiments, which will be described in detail in conjunction with fig. 3. Fig. 3 illustrates a flow diagram of a method 300 for automated generation of video, according to some embodiments of the present disclosure. It should be noted that, for the sake of clarity, the video automation generation method described in fig. 3 will be described with emphasis on different parts from the video automation generation method described in fig. 2, and the same or similar parts will be omitted.

At block 302, second content information is mapped to at least one padding bit to replace first content information corresponding to the at least one padding bit according to a modality of the second content information. In one example, first, for each video element in the first template, the video design platform 101 may parse the structural features and the content features of the first template one by one to select the padding bits corresponding to the second content information from all the padding bits according to the second content information. For example, the second content information received by the video design platform 101 is a material 1021, a material 1022, and a material 1023, wherein the modality of the material 1021 is text-type content information, the modality of the material 1022 is picture-type content information, the modality of the material 1023 is sound-type content information, and the video design platform 101 selects a filling bit 1 of a text class, a filling bit 2 of a picture class, and a filling bit 3 of a sound class respectively corresponding to the material 1021, the material 1022, and the material 1023 from all filling bits (e.g., filling bits 1 to n) by parsing the first template. Next, the video design platform 101 may map the second content information to the selected padding bits according to the modality of the second content information. For example, material 1021, material 1022, and material 1023 are placed in fill bit 1, fill bit 2, and fill bit 3, respectively.

At block 304, mapping attribute information is constructed based on a mapping of the second content information to at least one padding bit. For example, after the material 1021, the material 1022, and the material 1023 are respectively placed in the fill level 1, the fill level 2, and the fill level 3, mapping attribute information such as a position, a size, and a time required by the material 1021, the material 1022, and the material 1023 in the first template is determined.

At block 306, the first template is adjusted according to the mapping attribute information to obtain a second template.

In some embodiments, the first template may be adjusted to obtain the second template by. First, at least one video element of the plurality of video elements to be adjusted is determined for the first template according to the mapping attribute information. Next, the first template is adjusted according to the second content information for at least one of the plurality of video elements to be adjusted. For example, if the material 1021 needs to have a duration of 10s in the first template and the duration of the video element 1 corresponding to the padding bit 1 is 5s, the duration of the video element 1 in the first template and the time and position of other video elements (e.g., video element 2 through video element n) are adjusted accordingly. For example, the video design platform 101 may also adjust the dependencies between the various video elements in the first template.

By this method, the template can be adaptively adjusted according to different input data (or different content information), so that the automatically generated video has high quality.

In some embodiments, the video automatic generation method may further include: the mapping of the second content information to the at least one padding bit is checked before adjusting the first template. For example, before adjusting the first template, the video design platform 101 verifies whether the text-type content information of the material 1021 belongs to the content information required for fill level 1, and more specifically, for example, the video design platform 101 ensures that the text of the material 1021 is ABCDE, rather than ABCDE or other text content. By adopting the embodiment, the quality of the automatically generated video can be further improved.

Fig. 4 shows a schematic block diagram of an apparatus 400 for automated video generation according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus 400 includes a feature information obtaining module 402 configured to obtain feature information of a first template, the first template including a plurality of video elements. The apparatus 400 further includes a fill bit identification module 404 configured to identify a plurality of fill bits of the first template based on the characteristic information, wherein in each of the plurality of video elements, the first content information corresponding to each of the plurality of fill bits is replaceable. The apparatus 400 further comprises a content information replacement module 406 configured to replace the first content information corresponding to at least one of the plurality of padding bits with the second content information to obtain a second template. The apparatus 400 further includes a video generation module 408 configured to generate a video using the second template.

In some embodiments, the content information replacement module 406 includes: a content information mapping module configured to map the second content information to the at least one padding bit to replace the first content information corresponding to the at least one padding bit according to a modality of the second content information; a mapping attribute construction module configured to construct mapping attribute information based on a mapping of the second content information to the at least one padding bit; and a template adjusting module configured to adjust the first template according to the mapping attribute information.

In some embodiments, the template adjustment module comprises: a determination module configured to determine at least one of the plurality of video elements to be adjusted for the first template according to the mapping attribute information; and an adjustment module configured to adjust the first template according to the second content information for at least one of the plurality of video elements to be adjusted.

In some embodiments, the feature information includes machine-recognizable structural and content features corresponding to each of the plurality of video elements and dependency features between each of the plurality of video elements. The feature information acquisition module 402 includes: a feature parsing module configured to parse the first template to obtain machine-recognizable structural and content features corresponding to each of the plurality of video elements; and a feature construction module configured to construct a dependency relationship between each of the plurality of video elements.

In some embodiments, the feature construction module constructs the dependency between each of the plurality of video elements according to one or more of: a name of a first dependent video element of the plurality of video elements on which the start time of each video element depends; the start time of each video element depends on the determination of the start time or end time of the first dependent video element; a name of a second dependent video element of the plurality of video elements on which the duration of each video element depends; a determination of whether each video element can be replaced; and the modality of each video element.

In some embodiments, the first content information includes text-type content information, picture-type content information, sound-type content information, and video-type content information.

In some embodiments, the apparatus 400 further comprises a checking module configured to check the mapping of the second content information to the at least one padding bit before adjusting the first template.

By adopting the video automatic generation device in any embodiment, the manual participation in the video generation period can be greatly reduced, and the efficiency of video automatic generation is improved.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. The example electronic device 500 may be used to implement the video design platform 101 in fig. 1. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 performs the various methods and processes described above, such as the method 200 and the method 300. For example, in some embodiments, the

methods

200 and 300 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When loaded into RAM 503 and executed by the computing unit 501, may perform one or more of the steps of the

methods

200 and 300 described above. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method 200 and the method 300 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for automated video generation, comprising:

acquiring feature information of a first template, wherein the first template comprises a plurality of video elements;

identifying a plurality of padding bits of the first template based on the feature information, wherein in each of the plurality of video elements, first content information corresponding to each of the plurality of padding bits can be replaced;

replacing first content information corresponding to at least one of the plurality of padding bits with second content information to obtain a second template; and

and generating a video by utilizing the second template.

2. The method of claim 1, wherein the replacing the first content information corresponding to at least one of the plurality of padding bits with the second content information to obtain the second template comprises:

mapping the second content information to the at least one padding bit according to a modality of the second content information to replace the first content information corresponding to the at least one padding bit;

constructing mapping attribute information based on a mapping of the second content information to the at least one padding bit; and

and adjusting the first template according to the mapping attribute information.

3. The method of claim 2, wherein said adjusting the first template according to the mapping attribute information comprises:

determining at least one of the plurality of video elements to be adjusted for the first template according to the mapping attribute information; and

adjusting the first template according to the second content information for the at least one video element.

4. The method of claim 1, wherein the feature information includes machine-recognizable structural and content features corresponding to each of the plurality of video elements and dependency features between each of the plurality of video elements; and is

Wherein the obtaining the feature information of the first template comprises:

parsing the first template to obtain the structural features and the content features that are machine-recognizable corresponding to each of the plurality of video elements; and

constructing a dependency relationship between each of the plurality of video elements.

5. The method of claim 4, wherein the dependencies between each of the plurality of video elements are constructed according to one or more of:

a name of a first dependent video element of the plurality of video elements on which a start time of the each video element depends;

the start time of each video element depends on the judgment of the start time or the end time of the first dependent video element;

a name of a second dependent video element of the plurality of video elements on which the duration of each video element depends;

a determination of whether each of the video elements can be replaced; and

a modality of each of the video elements.

6. The method of claim 1, wherein the first content information comprises text-type content information, picture-type content information, sound-type content information, and video-type content information.

7. The method of claim 2, further comprising:

checking a mapping of the second content information to the at least one padding bit before adjusting the first template.

8. An apparatus for automated video generation, comprising:

a feature information acquisition module configured to acquire feature information of a first template, wherein the first template comprises a plurality of video elements;

a padding bit identification module configured to identify a plurality of padding bits of the first template based on the feature information, wherein in each of the plurality of video elements, first content information corresponding to each of the plurality of padding bits can be replaced;

a content information replacement module configured to replace first content information corresponding to at least one of the plurality of padding bits with second content information to obtain a second template; and

a video generation module configured to generate a video using the second template.

9. The apparatus of claim 8, wherein the content information replacement module comprises:

a content information mapping module configured to map the second content information to the at least one padding bit to replace the first content information corresponding to the at least one padding bit according to a modality of the second content information;

a mapping attribute construction module configured to construct mapping attribute information based on a mapping of the second content information to the at least one padding bit; and

a template adjustment module configured to adjust the first template according to the mapping attribute information.

10. The apparatus of claim 9, wherein the template adjustment module comprises:

a determination module configured to determine at least one of the plurality of video elements to adjust for the first template according to the mapping attribute information; and

an adjustment module configured to adjust the first template according to the second content information for the at least one video element.

11. The apparatus of claim 8, wherein the feature information comprises machine-recognizable structural and content features corresponding to each of the plurality of video elements and dependency features between each of the plurality of video elements; and is

Wherein the characteristic information acquisition module includes:

a feature parsing module configured to parse the first template to obtain the structural features and the content features that are machine-identifiable corresponding to each of the plurality of video elements; and

a feature construction module configured to construct a dependency relationship between each of the plurality of video elements.

12. The apparatus of claim 11, wherein the feature construction module constructs the dependencies between each of the plurality of video elements according to one or more of:

a determination of whether each of the video elements can be replaced; and

a modality of each of the video elements.

13. The apparatus of claim 8, wherein the first content information comprises text-type content information, picture-type content information, sound-type content information, and video-type content information.

14. The apparatus of claim 9, further comprising:

a check module configured to check a mapping of the second content information to the at least one padding bit prior to adjusting the first template.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.