CN112118397B

CN112118397B - Video synthesis method, related device, equipment and storage medium

Info

Publication number: CN112118397B
Application number: CN202011008062.9A
Authority: CN
Inventors: 袁佳平
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2021-06-22
Anticipated expiration: 2040-09-23
Also published as: CN112118397A

Abstract

The application discloses a video synthesis method, a related device, equipment and a storage medium, which are used in the technical field of multimedia processing. The method comprises the following steps: acquiring an element selection instruction aiming at a target stage type; in response to the element selection instruction, determining a target stage element from selectable stage elements corresponding to the target stage type; acquiring a video sequence frame corresponding to a live-action shooting video; and synthesizing the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain the target video. The method provided by the application can enrich the performance forms of actors, meet the personalized collocation of the stage and improve the flexibility of application development on the one hand, and on the other hand, can save the time investment of art designing and development in both the manufacturing stage and the maintenance stage, and saves the time cost of video manufacturing and later maintenance.

Description

Video synthesis method, related device, equipment and storage medium

Technical Field

The present application relates to the field of multimedia processing technologies, and in particular, to a method, a related apparatus, a device, and a storage medium for video composition.

Background

Content produced using hypertext Markup Language 5 (HTML 5) can be referred to as an H5 scene. The H5 scene display mode is more rich and interesting, the interactivity is stronger, the display mode has the functions of sliding page turning, animation special effects, touch animation, music playing and the like, the content is not boring, and the display mode has stronger transmissibility.

Currently, a magic application implemented based on H5 has been developed, in which a player can interact with an actor who is actually photographed. Referring to fig. 1 (a) and (B), as shown in fig. 1 (a), the actor asks the user how to play cards, and the user selects one card and then enters the interface shown in fig. 1 (B), and the actor asks the user again to choose how to play cards.

However, for the actors, the limb motions when inquiring the user how to deal cards are the same, but because the displayed card suits are different, developers are required to make two independent video files for the same set of limb motions of the actors, so that the cost of art designing is increased, and the efficiency of application development is reduced.

Disclosure of Invention

The embodiment of the application provides a video synthesis method, a related device, equipment and a storage medium, which can enrich the performance forms of actors, meet the personalized collocation of stages and improve the flexibility of application development on the one hand, and can save the time investment of art designing and development no matter in the manufacturing stage or the maintenance stage on the other hand, thereby saving the time cost of video production and later maintenance.

In view of the above, an aspect of the present application provides a method for video composition, including:

acquiring an element selection instruction aiming at a target stage type, wherein the element selection instruction carries an element identifier;

responding to an element selection instruction, and determining a target stage element from selectable stage elements corresponding to the target stage type, wherein the target stage element is shown in an associated layer;

acquiring a video sequence frame corresponding to a live-action shooting video, wherein the video sequence frame comprises M sequence frame images, each sequence frame image comprises a foreground area and a background area, the foreground area is used for displaying a target object, the background area is transparent, the video sequence frame is displayed on a target layer, and M is an integer greater than 1;

and synthesizing the video sequence frames and the target stage elements according to the layer relation between the associated layers and the target layer to obtain a target video, wherein the target video comprises M synthesized video frames.

Another aspect of the present application provides a video compositing apparatus, comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an element selection instruction aiming at a target stage type, and the element selection instruction carries an element identifier;

the determining module is used for responding to the element selection instruction and determining a target stage element from selectable stage elements corresponding to the target stage type, wherein the target stage element is shown in the associated layer;

the acquisition module is further used for acquiring a video sequence frame corresponding to the live-action shooting video, wherein the video sequence frame comprises M sequence frame images, each sequence frame image comprises a foreground area and a background area, the foreground area is used for displaying a target object, the background area is transparent, the video sequence frame is displayed on a target layer, and M is an integer greater than 1;

and the processing module is used for synthesizing the video sequence frames and the target stage elements according to the layer relation between the associated layers and the target layer to obtain a target video, wherein the target video comprises M synthesized video frames.

In one possible design, in one implementation of another aspect of an embodiment of the present application,

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is specifically used for acquiring a live-action shooting video, the live-action shooting video comprises M video frames, and each video frame comprises a target object;

performing transparency processing on a background area in each video frame aiming at each video frame in the live-action shooting video to obtain a sequence frame image;

and generating video sequence frames according to the sequence frame image corresponding to each video frame.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the target stage type includes a stage close-up type;

the acquisition module is specifically used for providing a stage editing interface;

acquiring an element selection instruction aiming at a stage close shot type through a stage editing interface;

the determining module is specifically configured to determine a target close-range element from selectable stage elements corresponding to the stage close-range type according to an element selection instruction for the stage close-range type, where the target close-range element is displayed on a first layer, and the first layer belongs to an associated layer;

and the processing module is specifically used for covering the first layer on the target layer, and synthesizing the video sequence frame and the target close-range elements to obtain the target video.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the target stage type includes a stage perspective type;

acquiring an element selection instruction aiming at a stage distant view type through a stage editing interface;

the determining module is specifically configured to determine, according to an element selection instruction for a stage perspective type, a target perspective element from selectable stage elements corresponding to the stage perspective type, where the target perspective element is displayed in a second layer, and the second layer belongs to an associated layer;

and the processing module is specifically used for covering the target layer on the second layer, and performing synthesis processing on the video sequence frame and the target distant view element to obtain a target video.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the target stage type includes a stage floor type;

acquiring an element selection instruction aiming at the type of the stage floor through a stage editing interface;

the determining module is specifically configured to determine a target floor element from selectable stage elements corresponding to stage floor types according to an element selection instruction for the stage floor types, where the target floor element is shown in a third layer, and the third layer belongs to an associated layer;

and the processing module is specifically used for covering the target layer on the third layer, and performing synthesis processing on the video sequence frame and the target floor element to obtain a target video.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the target stage type includes a stage background type;

acquiring an element selection instruction aiming at a stage background type through a stage editing interface;

the determining module is specifically configured to determine, according to an element selection instruction for a stage background type, a target background element from selectable stage elements corresponding to the stage background type, where the target background element is shown in a fourth layer, and the fourth layer belongs to an associated layer;

and the processing module is specifically used for covering the target layer on the fourth layer, and performing synthesis processing on the video sequence frame and the target background element to obtain a target video.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the target stage type includes a stage text type;

acquiring an element selection instruction aiming at a stage text type through a stage editing interface;

the determining module is specifically used for determining a target text font from selectable stage elements corresponding to the stage text type according to the element selection instruction for the stage text type;

acquiring a target text element corresponding to a target text font, wherein the target text element is displayed on a fifth layer, and the fifth layer belongs to an associated layer;

and the processing module is specifically used for covering the fifth layer on the target layer, and performing synthesis processing on the video sequence frame and the target text element to obtain a target video.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the target stage type includes a stage close-up type, a stage distant-up type, a stage floor type, a stage background type, and a stage text type;

the determining module is specifically used for determining a target close-range element from selectable stage elements corresponding to the stage close-range type according to the element selection instruction for the stage close-range type, wherein the target close-range element is displayed on the first layer;

according to the element selection instruction aiming at the stage distant view type, determining a target distant view element from selectable stage elements corresponding to the stage distant view type, wherein the target distant view element is displayed on the second layer;

according to the element selection instruction aiming at the stage floor type, determining a target floor element from selectable stage elements corresponding to the stage floor type, wherein the target floor element is displayed in a third layer;

determining a target background element from selectable stage elements corresponding to the stage background type according to the element selection instruction for the stage background type, wherein the target background element is shown in the fourth layer;

determining a target text font from selectable stage elements corresponding to the stage text type according to the element selection instruction for the stage text type;

acquiring a target text element corresponding to a target text font, wherein the target text element is displayed on a fifth layer, and the first layer, the second layer, the third layer, the fourth layer and the fifth layer all belong to associated layers;

and the processing module is specifically configured to sequentially superimpose the fifth layer, the first layer, the target layer, the second layer, the third layer and the fourth layer according to a top-to-bottom order, and perform synthesis processing on a target text element, a target close-range element, a video sequence frame, a target far-range element, a target floor element and a target background element to obtain a target video.

In one possible design, in another implementation of another aspect of an embodiment of the present application,

the processing module is specifically used for rendering the sequence frame images and the target stage elements onto canvas according to the layer relation between the associated layers and the target layers aiming at each sequence frame image in the video sequence frames;

for each sequence frame image in the video sequence frames, displaying a synthesized video frame corresponding to the sequence frame image through a canvas;

a target video is generated from the M synthesized video frames.

the acquiring module is further used for acquiring animation sequence frames corresponding to the animation video, wherein the animation sequence frames comprise M animation sequence frame images, each animation sequence frame image comprises a foreground area and a background area, the foreground area is used for displaying an animation object, the background area is transparent, and the animation sequence frames are displayed on the target layer;

and the processing module is specifically used for performing synthesis processing on the video sequence frame, the animation sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain the target video.

the acquisition module is also used for acquiring the audio to be played;

the processing module is specifically used for performing synthesis processing on the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain a video to be processed;

and synthesizing the audio to be played and the video to be processed to obtain the target video.

In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the video composition apparatus further includes a playing module,

the playing module is used for synthesizing the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain a target video, and playing the target video when the playing operation aiming at the target video is detected;

the playing module is further used for pausing the playing of the target video if the pause operation aiming at the target video is detected when the target video is being played;

the playing module is further used for continuing playing the target video if the playing operation aiming at the target video is detected when the target video is paused to be played;

the playing module is further configured to, when the target video is being played, end playing the target video if a skip operation for the target video is detected.

Another aspect of the present application provides a terminal device, including: a memory, a transceiver, a processor, and a bus system;

wherein, the memory is used for storing programs;

a processor for executing the program in the memory, the processor for performing the method provided in accordance with the above aspects of the instructions in the program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

Another aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-described aspects.

In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above aspects.

According to the technical scheme, the embodiment of the application has the following advantages:

in the embodiment of the application, a video synthesis method is provided, and first, a terminal device obtains an element selection instruction for a target stage type, so that the target stage element can be determined from selectable stage elements corresponding to the target stage type. In addition, the terminal device needs to acquire a video sequence frame corresponding to the live-action shooting video, and finally performs synthesis processing on the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain the target video. Through the method, except that actors adopt videos shot by real scenes, other stage elements are dynamically synthesized in real time through programs. Because the selectable stage elements are all configured in advance, developers only need to provide a set of actor live-action videos according to different scene requirements, on one hand, the performance forms of actors can be enriched, personalized collocation of the stage is met, flexibility of application and development is improved, on the other hand, time investment of the artists and development can be saved no matter in the manufacturing stage or in the maintenance stage, and time cost of video manufacturing and later maintenance is saved.

Drawings

Figure 1 is a schematic view of an interface for magic applications in a prior art scenario;

FIG. 2 is a block diagram of a video compositing system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an embodiment of a method for video composition according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an embodiment of selecting a target stage type in an embodiment of the present application;

FIG. 5 is a schematic diagram of an embodiment of a stage element showing an object in an embodiment of the present application;

FIG. 6 is a schematic diagram of an embodiment of a video sequence frame in an embodiment of the present application;

FIG. 7 is a schematic diagram of an embodiment of a synthesized video frame in an embodiment of the present application;

FIG. 8 is a schematic diagram of an embodiment of sequential frame images in an embodiment of the present application;

FIG. 9 is a schematic diagram of an embodiment of a first layer in an embodiment of the present application;

FIG. 10 is a schematic diagram of an embodiment of a visual relationship between layers in an embodiment of the present application;

FIG. 11 is a schematic diagram of an embodiment of generating a synthesized video frame in an embodiment of the present application;

FIG. 12 is a schematic diagram of an embodiment of a second layer in an embodiment of the present application;

FIG. 13 is a schematic diagram of another embodiment of a visual relationship between layers in an embodiment of the present application;

FIG. 14 is a schematic diagram of another embodiment of generating a synthesized video frame in an embodiment of the present application;

FIG. 15 is a schematic diagram of another embodiment of a visual relationship between layers in an embodiment of the present application;

FIG. 16 is a schematic diagram of another embodiment of generating a synthesized video frame in the embodiment of the present application;

FIG. 17 is a schematic diagram of another embodiment of a visual relationship between layers in an embodiment of the present application;

FIG. 18 is a schematic diagram of another embodiment of generating a synthesized video frame in the embodiment of the present application;

FIG. 19 is a schematic diagram of an embodiment of a fifth layer in an embodiment of the present application;

FIG. 20 is a schematic diagram of another embodiment of a visual relationship between layers in an embodiment of the present application;

FIG. 21 is a schematic diagram of another embodiment of generating a synthesized video frame in an embodiment of the present application;

FIG. 22 is a diagram illustrating an embodiment of a visual relationship between an associated layer and a target layer in an embodiment of the present application;

FIG. 23 is a schematic diagram of another embodiment of generating a synthesized video frame in an embodiment of the present application;

FIG. 24 is a schematic flow chart of a video compositing method according to an embodiment of the present application;

FIG. 25 is a schematic diagram of another embodiment of generating a synthesized video frame in an embodiment of the present application;

FIG. 26 is a schematic diagram of an embodiment of browsing a target video in the embodiment of the present application;

fig. 27 is a schematic diagram of an embodiment of a video compositing apparatus according to the present embodiment;

fig. 28 is a schematic structural diagram of a terminal device in the embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Content produced using HTML5 may be referred to as an H5 scene. The H5 scene display mode is more rich and interesting, the interactivity is stronger, the display mode has the functions of sliding page turning, animation special effects, touch animation, music playing and the like, the content is not boring, and the display mode has stronger transmissibility. The video content manufactured by the game engine has good interactivity, and players can know the background stories of the game and the like through the video played in the game. The video content produced by using the film and television production software is very rich, and audiences can achieve the entertainment purpose through videos such as movies, television shows, cartoons and the like. However, in some cases, multiple versions of video content may need to be designed for the same set of physical movements of the same actor.

In order to save the time cost for producing the video and maintaining the video in the scene, the present application proposes a video composition method, which is applied to the video composition system shown in fig. 2, as shown in the figure, the video composition system includes a server and a terminal device. Taking a terminal device for video synthesis as a terminal device 1 and a terminal device used for watching a video as a terminal device 2 as an example for explanation, the terminal device 1 obtains an element selection instruction, and then can determine a target stage element from selectable stage elements corresponding to a target stage type according to the element selection instruction, where the target stage element is displayed on an associated layer. The terminal device 1 may further obtain video sequence frames corresponding to the live-action shooting video and including M sequence frame images, and then display the video sequence frames on the target layer. Further, the terminal device 1 performs synthesis processing on the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain a target video,

in the testing process, a developer plays a target video at the terminal device 1 side, and if the target video is played normally, the developer can determine that the video synthesis work is completed. Then, the target video can be sent to the server through the terminal device 1, and the server issues the target video to the terminal device 2, so that the user can play the target video through the terminal device 2. Because the selectable stage elements are all configured in advance, developers only need to provide a set of actor live-action videos according to different scene requirements, on one hand, the performance forms of actors can be enriched, personalized collocation of the stage is met, flexibility of application and development is improved, on the other hand, time investment of the artists and development can be saved no matter in the manufacturing stage or in the maintenance stage, and time cost of video manufacturing and later maintenance is saved.

It can be understood that the server related to the present application may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a palm computer, a personal computer, a smart television, a smart watch, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. The number of servers and terminal devices is not limited.

With reference to the above description, the method for video composition in the present application will be described below, please refer to fig. 3, where fig. 3 is a schematic diagram of an embodiment of a method for video composition in an embodiment of the present application, and as shown in the drawing, an embodiment of the method for video composition in the embodiment of the present application includes:

101. the method comprises the steps that a terminal device obtains an element selection instruction aiming at a target stage type, wherein the element selection instruction carries an element identifier;

in this embodiment, the developer may select any one stage type from the plurality of stage types, where the selected stage type is the target stage type, and based on this, the terminal device may obtain an element selection instruction triggered by the developer for the target stage type. Wherein the element selection instruction carries an element identifier.

Specifically, the stage types include two types, the first type is a pieced together type, that is, the plurality of stage types include a stage close shot type, a stage distant shot type, a stage floor type, a stage background type, a stage text type, and the like, and the target stage type is any one of the stage types. Based on this, the element identification is used to indicate the identification of a certain stage element in the target stage type.

The second type is a combined type, namely a plurality of stage types are configured stage types, and the target stage type is any one of the stage types. Based on this, the element identifier is used to indicate an identifier corresponding to the target stage type, for example, the element identifier corresponding to the stage type a is "1", the element identifier corresponding to the stage type B is "2", and if the carried element identifier is "1", the target stage type is the stage type a. For convenience of understanding, please refer to fig. 4, wherein fig. 4 is a schematic diagram of an embodiment of selecting a target stage type in an embodiment of the present application, and as shown in the drawing, a1, a2, and A3 are respectively used for indicating different stage types, and a4 is used for indicating a confirmation interface. Assuming that the validation interface is operated after the stage type "bouquet middle" is selected, an element selection instruction for the stage type "bouquet middle" is generated.

It should be understood that the first stage type selection mode is more flexible, and developers can respectively select and piece together stage elements in various stage types according to actual requirements. The second stage type selection mode is simpler, which is equivalent to pre-configuring a plurality of templates corresponding to the stage types, and when the stage type selection mode is actually used, one template can be directly selected. The present application is described in detail with respect to a selection manner of a first stage type, and it is understood that, in the process of practical application, a selection manner of a second stage type may also be adopted.

102. The terminal equipment responds to the element selection instruction, and determines a target stage element from selectable stage elements corresponding to the target stage type, wherein the target stage element is displayed on an associated layer;

in this embodiment, the terminal device may select a target stage element from the plurality of selectable stage elements according to the element identifier carried by the element selection instruction, where the target stage element is displayed on the associated layer.

Specifically, for the convenience of understanding, the selectable stage element a and the selectable stage element B are included as an example for explanation, please refer to fig. 5, fig. 5 is a diagram illustrating an embodiment of a target stage element in an embodiment of the present application, as shown in fig. 5, a diagram (a) is used for indicating the selectable stage element a, and a diagram (B) is used for indicating the selectable stage element B, and assuming that the selectable stage element B is determined as the target stage element, the selectable stage element B is displayed on an associated layer as shown in a diagram (C) in fig. 5.

103. The method comprises the steps that terminal equipment obtains a video sequence frame corresponding to a live-action shooting video, wherein the video sequence frame comprises M sequence frame images, each sequence frame image comprises a foreground area and a background area, the foreground area is used for displaying a target object, the background area is transparent, the video sequence frame is displayed on a target layer, and M is an integer greater than 1;

in this embodiment, the terminal device obtains a live-action shot video, and further obtains a video sequence frame, where the video sequence frame includes M sequence frame images, and each sequence frame image is displayed on the target layer.

It should be noted that the video sequence frame may be a complete live-action video, or may be one of the segments, which is not limited herein, for example, the live-action video may be divided into two video segments, i.e., a video segment a and a video segment, respectively, and then the video sequence frame may be a live-action video, or a video segment a, or a video segment B. It should be noted that the target object may be a real person, a real animal, a three-dimensional (3D) modeling person, or other objects, and is not limited herein. It should be noted that the background area in the sequence frame images is transparent, that is, the background area needs to be subjected to a transparency process, so that the background area in each sequence frame image is transparent. The sequence frame images in the video sequence frames are extracted according to a certain frame rate, for example, the sequence frame images are extracted at a frame rate of 24 frames per second.

It should be noted that the image formats of the sequence frame images include, but are not limited to, image formats with transparent channels, such as a portable network graphics format (PNG), a Tag Image File Format (TIFF), or a Graphics Interchange Format (GIF), because the background region in the sequence frame images needs to be subjected to the transparentization processing, and the transparentization processing needs to rely on the transparent channels of the sequence frame images to prevent the background of the upper layer images from being blocked by the lower layer images when the images are stacked.

For the convenience of understanding, a target object is taken as an example for describing a 3D modeled character, please refer to fig. 6, fig. 6 is a schematic diagram of an embodiment of a video sequence frame in the embodiment of the present application, as shown, B1 to B12 are respectively used for indicating sequence frame images in the video sequence frame, B11 and B12 are respectively used for indicating different target objects, and B13 is used for indicating a background area. For the sequence frame image B1 indicated by B1, the foreground region displays the target object, and the background region B13 is transparent (white in the figure, actually transparent), similarly, the foreground region in the sequence frame image B2 to the sequence frame image B12 displays the target object, and the background region is transparent, and the sequence frame image is shown in the target layer.

It is understood that, in the embodiment of the present application, there is no sequential limitation between step 101 and step 103, that is, step 101 and step 103 may be performed simultaneously, or step 101 and step 103 may be performed in a sequential order, which is not limited herein.

104. And the terminal equipment synthesizes the video sequence frames and the target stage elements according to the layer relation between the associated layers and the target layer to obtain a target video, wherein the target video comprises M synthesized video frames.

In this embodiment, the terminal device performs synthesis processing on the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer, so as to obtain a target video including M synthesized video frames. Specifically, the terminal device needs to perform a synthesis process on each sequence frame image with the target stage element, so as to obtain a synthesized video frame. For easy understanding, please refer to fig. 7, in which fig. 7 is a schematic diagram of an embodiment of a synthesized video frame in an embodiment of the present application, and as shown in the drawing, C1 and C2 are respectively used to indicate different target stage elements, and C3 and C4 are respectively used to indicate different target objects. The synthesized video frame includes a target stage element and a target object. It should be understood that the foregoing embodiments are only used for understanding the present solution, and the target stage elements and the target objects included in the specific target video need to be flexibly determined according to actual situations, and are not limited herein.

In the embodiment of the application, a video synthesis method is provided, and by the above manner, except that actors adopt videos shot in real scenes, other stage elements are dynamically synthesized in real time through a program. Because the selectable stage elements are all configured in advance, developers only need to provide a set of actor live-action videos according to different scene requirements, on one hand, the performance forms of actors can be enriched, personalized collocation of the stage is met, flexibility of application and development is improved, on the other hand, time investment of the artists and development can be saved no matter in the manufacturing stage or in the maintenance stage, and time cost of video manufacturing and later maintenance is saved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in an optional embodiment provided in the embodiment of the present application, the obtaining, by the terminal device, a video sequence frame corresponding to the live-action video specifically includes the following steps:

the method comprises the steps that terminal equipment obtains a live-action shooting video, wherein the live-action shooting video comprises M video frames, and each video frame comprises a target object;

the method comprises the steps that terminal equipment conducts transparentization processing on a background area in each video frame aiming at each video frame in a live-action shooting video to obtain a sequence frame image;

and the terminal equipment generates video sequence frames according to the sequence frame image corresponding to each video frame.

In this embodiment, a method for acquiring a video sequence frame is described. The method comprises the steps that terminal equipment obtains a live-action shot video, each video frame in the live-action shot video comprises a target object, on the basis, a background area in each video frame needs to be subjected to transparentization processing, sequence frame images are obtained, and then video sequence frames are obtained on the basis of M sequence frame images subjected to transparentization processing. The live-action video is a video obtained by directly shooting a live action through a camera, and the live-action video can be directly read from a memory of the terminal device or received from other devices, and the mode of obtaining the live-action video is not limited herein.

Specifically, the transparentizing process may use image processing software to adjust the transparency of the background region, such as Adobe Photoshop (PS), to convert the background region in the video frame into transparency. Or performing matting processing on the target object, selecting a region (background region) of a non-target object, adding a layer which is the layer where the target object is located through the copied layer after selecting a complete background region, and then setting the background layer to be hidden, namely only displaying the layer where the target object is located, wherein the background region is transparent, namely, the transparentization processing is completed. Or, it is also possible to perform Artificial Intelligence (AI) recognition on the target object, determine the target object through the AI recognition, and extract the target object, and set other regions (background regions) other than the target object to be transparent.

For easy understanding, please refer to fig. 8, fig. 8 is a schematic diagram of an embodiment of a sequence frame image in an embodiment of the present application, as shown in fig. 8 (a) is used to indicate a video frame in a live-action captured video, fig. 8 (a) includes an indicated target object and a background area, fig. 8 (B) is obtained by performing a transparency process on the background area in fig. 8 (a), and fig. 8 (B) is used to indicate a sequence frame image.

In the embodiment of the application, a method for acquiring video sequence frames is provided, and by the above manner, the background area in each video frame in the live-action shooting video is subjected to transparentization processing, so that only a target object is displayed in the foreground area in the video sequence frames, the influence of the background area on the video sequence frames is removed, and the subsequent video processing is facilitated.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the target stage type includes a stage close-up type;

the method comprises the following steps that the terminal equipment obtains an element selection instruction aiming at a target stage type:

the terminal equipment provides a stage editing interface;

the method comprises the steps that a terminal device obtains an element selection instruction aiming at a stage close shot type through a stage editing interface;

the terminal equipment responds to the element selection instruction, determines the target stage element from the selectable stage elements corresponding to the target stage type, and specifically comprises the following steps:

the terminal equipment determines a target close-range element from selectable stage elements corresponding to the stage close-range type according to an element selection instruction for the stage close-range type, wherein the target close-range element is displayed on a first layer, and the first layer belongs to an associated layer;

the method comprises the following steps that the terminal equipment carries out synthesis processing on a video sequence frame and a target stage element according to a layer relation between an associated layer and a target layer to obtain a target video, and specifically comprises the following steps:

and the terminal equipment covers the first layer on the target layer, and performs synthesis processing on the video sequence frame and the target close-range element to obtain a target video.

In this embodiment, a manner of adding a stage element in a stage close-up type is described. The target stage type may include a stage close shot type, and based on this, a target close shot element in the stage close shot type may be selected on the stage editing interface, that is, an element selection instruction carrying an element identifier is generated, where the element identifier is used to indicate the target close shot element, and therefore, the terminal device determines the target close shot element from the selectable stage elements corresponding to the stage close shot type, where the target close shot element is displayed on the first layer, then covers the first layer on the target layer, and then performs synthesis processing on the video sequence frame and the target close shot element, thereby obtaining the target video.

Specifically, for convenience of understanding, a near view element 1 and a near view element 2 included in the selectable stage element are taken as an example for description, please refer to fig. 9, fig. 9 is an exemplary diagram of a first layer in an embodiment of the present application, as shown in fig. 9, (a) is a first layer corresponding to the near view element 1 in the stage editing interface, and (B) is a first layer corresponding to the near view element 2 in the stage editing interface. Assuming that the element identification corresponding to the near view element 1 is "a 1" and the element identification corresponding to the near view element 2 is "a 2", if the near view element 2 is selected, the terminal device acquires an element selection instruction carrying the element identification "a 2", thereby determining that the near view element 2 is the target near view element.

Referring to fig. 10, fig. 10 is a schematic diagram illustrating an embodiment of a visual relationship between layers in an embodiment of the present application, as shown in the figure, since a first layer is overlaid on a target layer, a target close-up element is shown in front of a target object for a human eye to see.

To further understand the present solution, a developer selects a close-range element 2 as a target close-range element for an example to explain, please refer to fig. 11, fig. 11 is an exemplary illustration of generating a synthesized video frame in an embodiment of the present application, as shown in the drawing, fig. 11 (a) is a target layer showing a target object, fig. 11 (B) is a first layer showing a target close-range element, the first layer is covered on the target layer, so as to obtain a synthesized video frame as shown in fig. 11 (C), and each sequence frame image in a video sequence frame is processed similarly, so as to obtain M synthesized video frames, thereby generating a target video.

In the embodiment of the application, a method for adding the stage elements under the stage close shot type is provided, and by the method, the first layer for displaying the target close shot elements is covered on the target layer for displaying the target object, so that the target video can also display the selected target close shot elements, personalized collocation of different requirements is met, and the flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the target stage type includes a stage perspective type;

the terminal equipment provides a stage editing interface;

the method comprises the steps that a terminal device obtains an element selection instruction aiming at a stage distant view type through a stage editing interface;

the terminal equipment determines a target long-range element from selectable stage elements corresponding to the stage long-range type according to an element selection instruction for the stage long-range type, wherein the target long-range element is displayed on a second layer, and the second layer belongs to an associated layer;

and the terminal equipment covers the target layer on the second layer, and performs synthesis processing on the video sequence frame and the target distant view element to obtain a target video.

In this embodiment, a manner of adding a stage element in a stage perspective type is described. The target stage type may include a stage distant view type, and based on this, a target distant view element in the stage distant view type may be selected on the stage editing interface, that is, an element selection instruction carrying an element identifier is generated, where the element identifier is used to indicate the target distant view element, and therefore, the terminal device determines the target distant view element from the selectable stage elements corresponding to the stage distant view type, where the target distant view element is displayed on the second layer, then covers the second layer with the target layer, and then performs synthesis processing on the video sequence frame and the target distant view element, thereby obtaining the target video.

Specifically, for convenience of understanding, a perspective element 1 and a perspective element 2 included in the selectable stage element are taken as an example for description, please refer to fig. 12, fig. 12 is a schematic diagram of an embodiment of a second layer in an embodiment of the present application, as shown in fig. 12, (a) is a second layer corresponding to the perspective element 1 in the stage editing interface, and (B) is a third layer corresponding to the perspective element 2 in the stage editing interface. Assuming that the element identification corresponding to the perspective element 1 is "B1" and the element identification corresponding to the perspective element 2 is "B2", if the perspective element 2 is selected, the terminal device acquires an element selection instruction carrying the element identification "B2", thereby determining that the perspective element 2 is the target perspective element.

Referring to fig. 13, fig. 13 is a schematic diagram of another embodiment of the visual relationship between layers in the embodiment of the present application, as shown in the figure, since the target layer is overlaid on the second layer, the target object is shown in front of the target perspective element for the human eye to see.

To further understand the present solution, a developer selects a perspective element 1 as a target perspective element for description, please refer to fig. 14, fig. 14 is another exemplary illustration of generating a synthesized video frame in the embodiment of the present application, as shown in fig. 14, a diagram (a) in fig. 14 is a target layer showing a target object, a diagram (B) in fig. 14 is a second layer showing a target perspective element, the target layer is covered on the second layer, so as to obtain a synthesized video frame as shown in fig. 14, (C), and each sequence frame image in a video sequence frame is processed similarly, so as to obtain M synthesized video frames, thereby generating a target video.

In the embodiment of the application, a mode of adding stage elements under a stage perspective type is provided. By the method, the target layer for displaying the target object is covered on the second layer for displaying the target distant view element, so that the target video can also display the selected target distant view element, the personalized collocation of different requirements is met, and the flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided by the embodiment of the present application, the target stage type includes a stage floor type;

the terminal equipment provides a stage editing interface;

the method comprises the steps that a terminal device obtains an element selection instruction aiming at a stage floor type through a stage editing interface;

the method comprises the steps that terminal equipment determines a target floor element from selectable stage elements corresponding to stage floor types according to an element selection instruction for the stage floor types, wherein the target floor element is displayed in a third layer, and the third layer belongs to an associated layer;

and the terminal equipment covers the target layer on the third layer, and performs synthesis processing on the video sequence frame and the target floor element to obtain a target video.

In this embodiment, a manner of adding a stage element under a stage floor type is described. The target stage type may include a stage floor type, and based on this, a target floor element in the stage floor type may be selected on the stage editing interface, that is, an element selection instruction carrying an element identifier is generated, where the element identifier is used to indicate the target floor element, and thus, the terminal device determines the target floor element from the selectable stage elements corresponding to the stage floor type, where the target floor element is displayed on the third layer, then covers the target layer on the third layer, and then performs synthesis processing on the video sequence frame and the target floor element, so as to obtain the target video.

Specifically, the selectable stage elements including the floor element 1 and the floor element 2 are taken as an example for introduction, and similar to the content described in fig. 12, a third layer corresponding to the floor element 1 is displayed in the stage editing interface, or a third layer corresponding to the floor element 2 is displayed in the stage editing interface. Assuming that the element identification corresponding to the floor element 1 is "C1" and the element identification corresponding to the floor element 2 is "C2", if the floor element 1 is selected, the terminal device may acquire an element selection instruction carrying the element identification "C1", thereby determining that the floor element 1 is the target floor element.

Referring to fig. 15, fig. 15 is a schematic diagram of another embodiment of the visual relationship between the layers in the embodiment of the present application, as shown in the figure, since the target layer is overlaid on the third layer, the target object is shown in front of the target floor element for the human eye to see.

To further understand the present solution, please refer to fig. 16, fig. 16 is a diagram illustrating another embodiment of generating a synthesized video frame in the embodiment of the present application, as shown in fig. 16, a diagram (a) is a target layer showing a target object, a diagram (B) is a third layer showing a target floor element, the target layer is covered on the third layer, so as to obtain the synthesized video frame as shown in fig. 16 (C), and each sequence frame image in a video sequence frame is processed similarly, so as to obtain M synthesized video frames, thereby generating a target video.

In an embodiment of the present application, a manner of adding floor elements under a stage floor style is provided. By the method, the target layer for displaying the target object is covered on the third layer for displaying the target floor element, so that the target video can also display the selected target floor element, personalized collocation of different requirements is met, and flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the target stage type includes a stage background type;

the terminal equipment provides a stage editing interface;

the method comprises the steps that a terminal device obtains an element selection instruction aiming at a stage background type through a stage editing interface;

the terminal equipment determines a target background element from selectable stage elements corresponding to the stage background type according to an element selection instruction for the stage background type, wherein the target background element is displayed on a fourth layer, and the fourth layer belongs to an associated layer;

and the terminal equipment covers the target layer on the fourth layer, and performs synthesis processing on the video sequence frame and the target background element to obtain a target video.

In this embodiment, a manner of adding a background element under a stage background type is described. The target stage type may include a stage background type, and based on this, a target background element in the stage background type may be selected on the stage editing interface, that is, an element selection instruction carrying an element identifier is generated, where the element identifier is used to indicate the target background element, and thus, the terminal device determines the target background element from the selectable stage elements corresponding to the stage background type, where the target background element is displayed on the fourth layer, then covers the fourth layer with the target layer, and then performs synthesis processing on the video sequence frame and the target background element, so as to obtain the target video.

Specifically, the selectable stage elements including the background element 1 and the background element 2 are taken as an example for introduction, and similar to the content described in fig. 12, a third layer corresponding to the background element 1 is displayed in the stage editing interface, or a third layer corresponding to the background element 2 is displayed in the stage editing interface. Assuming that the element identification corresponding to the background element 1 is "D1" and the element identification corresponding to the background element 2 is "D2", if the background element 1 is selected, the terminal device may acquire an element selection instruction carrying the element identification "D1", thereby determining that the background element 1 is the target background element.

Referring to fig. 17, fig. 17 is a schematic diagram of another embodiment of a visual relationship between layers in an embodiment of the present application, as shown in the figure, since a target layer is overlaid on a fourth layer, a target object is shown in front of a target background element for a human eye.

To further understand the present solution, please refer to fig. 18, fig. 18 is a schematic view of another embodiment of generating a synthesized video frame in the embodiment of the present application, as shown in the drawing, (a) in fig. 18 is a target layer showing a target object, and (B) in fig. 18 is a fourth layer showing a target background element, the target layer is covered on the fourth layer, so as to obtain the synthesized video frame shown in (C) in fig. 18, and each sequence frame image in a video sequence frame is similarly processed, so as to obtain M synthesized video frames, thereby generating a target video.

In an embodiment of the present application, a manner of adding a background element under a stage background type is provided. By the method, the target layer for displaying the target object is covered on the fourth layer for displaying the target background element, so that the target video can also display the selected target background element, personalized collocation of different requirements is met, and flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the target stage type includes a stage text type;

the terminal equipment provides a stage editing interface;

the method comprises the steps that the terminal equipment obtains an element selection instruction aiming at a stage text type through a stage editing interface;

the terminal equipment determines a target text font from selectable stage elements corresponding to the stage text type according to the element selection instruction aiming at the stage text type;

the terminal equipment acquires a target text element corresponding to a target text font, wherein the target text element is displayed on a fifth layer, and the fifth layer belongs to an associated layer;

and the terminal equipment covers the fifth layer on the target layer, and performs synthesis processing on the video sequence frame and the target text element to obtain a target video.

In this embodiment, a manner of adding text elements under a stage text type is described. The target stage type may include a stage text type, and based on this, a target text font, for example, a regular script, a song script, a cursive script, an clerical script, or the like, may be selected on the stage editing interface, thereby generating an element selection instruction carrying an element identifier for indicating the target text font. The terminal device can obtain an input text according to the determined target text font, wherein the text is a target text element, the target text element is displayed on the fifth layer, the fifth layer is covered on the target layer, and then the video sequence frame and the target text element are subjected to synthesis processing, so that a target video is obtained.

Specifically, the selectable stage elements include a "song style" and a "regular script", the "song style" corresponds to the text element 1, and the "regular script" corresponds to the text element 2 as an example for introduction, please refer to fig. 19, fig. 19 is an exemplary illustration of a fifth layer in the embodiment of the present application, as shown in the drawing, fig. 19 (a) shows the fifth layer corresponding to the "song style" in the stage editing interface, and fig. 19 (B) shows the fifth layer corresponding to the "regular script" in the stage editing interface. Assuming that the element identifier 1 corresponding to the "song style" is "E1", and the element identifier 2 corresponding to the "regular script" is "E2", if the developer selects the "regular script" as required, the terminal device may obtain an element selection instruction carrying the element identifier "E2", and the terminal device may determine that the text element 2 corresponding to the "regular script" is the target text element based on the element identifier "E2". Secondly, the first step is to carry out the first,

referring to fig. 20, fig. 20 is a schematic diagram of another embodiment of the visual relationship between layers in the embodiment of the present application, as shown in the figure, since the fifth layer is overlaid on the target layer, the target text element is shown in front of the target object for the human eye to see.

To further understand the present solution, please refer to fig. 21, fig. 21 is a diagram illustrating another embodiment of generating a synthesized video frame in the embodiment of the present application, as shown in the drawing, fig. 21 (a) is a target layer showing a target object, fig. 21 (B) is a fifth layer showing a target text element, the fifth layer is covered on the target layer, so as to obtain the synthesized video frame shown in fig. 21 (C), and each sequence frame image in the video sequence frames is processed similarly, so as to obtain M synthesized video frames, thereby generating a target video.

In the embodiment of the application, a mode for adding text elements under stage text types is provided. By the method, the fifth layer for displaying the target text element is covered on the target layer for displaying the target object, so that the target video can also display the selected target text element, personalized collocation of different requirements is met, and flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the target stage type includes a stage close-up type, a stage distant-view type, a stage floor type, a stage background type, and a stage text type;

the terminal equipment provides a stage editing interface;

the terminal equipment determines a target close-range element from selectable stage elements corresponding to the stage close-range type according to an element selection instruction for the stage close-range type, wherein the target close-range element is displayed on a first layer;

the terminal equipment determines a target perspective element from selectable stage elements corresponding to the stage perspective type according to the element selection instruction for the stage perspective type, wherein the target perspective element is displayed on the second layer;

the terminal equipment determines a target floor element from selectable stage elements corresponding to the stage floor type according to the element selection instruction for the stage floor type, wherein the target floor element is displayed in a third layer;

the terminal equipment determines a target background element from selectable stage elements corresponding to the stage background type according to the element selection instruction for the stage background type, wherein the target background element is displayed on the fourth layer;

the terminal equipment acquires a target text element corresponding to a target text font, wherein the target text element is displayed on a fifth layer, and the first layer, the second layer, the third layer, the fourth layer and the fifth layer all belong to associated layers;

and the terminal equipment sequentially overlaps the fifth layer, the first layer, the target layer, the second layer, the third layer and the fourth layer according to the sequence from top to bottom, and synthesizes a target text element, a target close-range element, a video sequence frame, a target far-range element, a target floor element and a target background element to obtain a target video.

In this embodiment, a method for synthesizing a target video based on multiple image layers is described. Since the target stage type may include a stage close shot type, a stage distant shot type, a stage floor type, a stage background type, and a stage text type, the terminal device needs to obtain an element selection instruction for different stage types, and a specific manner of obtaining the element selection instruction is similar to that described in the foregoing embodiment, so that details are not repeated here. The terminal device determines a target close-range element, a target distant-range element, a target floor element, a target background element, and a target text element, which are respectively displayed on different layers, and a specific manner of acquiring the target stage element is similar to that described in the foregoing embodiments, so details are not repeated here. Based on the above, the fifth layer, the first layer, the target layer, the second layer, the third layer and the fourth layer are sequentially overlaid in order from top to bottom, so that the target text element, the target close-range element, the video sequence frame, the target far-range element, the target floor element and the target background element are synthesized, and the target video is obtained.

Specifically, for convenience of understanding, please refer to fig. 22, where fig. 22 is a schematic view of an embodiment of a visual relationship between an associated layer and a target layer in an embodiment of the present application, as shown in the figure, for a human eye, an effect of the superimposed layer can be seen, a fifth layer appears at the forefront of the human eye, and a target text element is displayed on the fifth layer. And the next layer adjacent to the fifth layer is a first layer, and the target close-range elements are displayed on the first layer. And the next layer adjacent to the first layer is a target layer, and a target object is displayed on the target layer. And the next layer adjacent to the target layer is a second layer, and the target distant view element is displayed on the second layer. And the next layer adjacent to the second layer is a third layer, and the target floor element is displayed on the third layer. And the next layer adjacent to the third layer is a fourth layer, and the target background element is displayed on the fourth layer.

To further understand the present disclosure, please refer to fig. 23, fig. 23 is a diagram of another embodiment of generating a synthesized video frame according to the embodiment of the present disclosure, as shown in fig. 23, a diagram (a) is a target object shown in a target layer, a diagram (B) in fig. 23 is a target close-up view element shown in a first layer, a diagram (C) in fig. 23 is a target distant-up view element shown in a second layer, a diagram (D) in fig. 23 is a target floor element shown in a third layer, a diagram (E) in fig. 23 is a target background element shown in a fourth layer, and a diagram (F) in fig. 23 is a target text element shown in a fifth layer. Based on this, the fifth layer, the first layer, the target layer, the second layer, the third layer, and the fourth layer are sequentially superimposed in order from top to bottom, so that the synthesis processing of the target text element, the target close-range element, the video sequence frame, the target far-range element, the target floor element, and the target background element is realized, and the synthesized video frame illustrated in (G) in fig. 23 can be obtained, and each sequence frame image in the video sequence frame is similarly processed, so that M synthesized video frames can be obtained, and the target video is generated.

In the embodiment of the application, a method for synthesizing a target video based on multiple layers is provided, and in the above manner, a fifth layer, a first layer, a target layer, a second layer, a third layer and a fourth layer are sequentially overlaid according to a sequence from top to bottom, so that the target video can also display selected target text elements, target close-range elements, video sequence frames, target distant-range elements, target floor elements and target background elements, personalized collocation of different requirements is met, and flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the terminal device performs synthesis processing on the video sequence frame and the target stage element according to a layer relation between the associated layer and the target layer to obtain the target video, and specifically includes the following steps:

the terminal equipment renders the sequence frame images and the target stage elements onto canvas according to the layer relation between the associated layer and the target layer aiming at each sequence frame image in the video sequence frames;

the terminal equipment displays a synthesized video frame corresponding to each sequence frame image in the video sequence frames through a canvas;

and the terminal equipment generates a target video according to the M synthesized video frames.

In this embodiment, a method for frame-wise rendering a target video is introduced, where a terminal device renders each sequence frame image and a target stage element in a video sequence frame onto a Canvas (e.g., Canvas) according to an associated layer and a layer relationship between target layers, where the Canvas is a part of HTML5, and allows a scripting language to dynamically render bit images. Then, the terminal device displays the synthesized video frames corresponding to the sequence frame images through the canvas aiming at each sequence frame image in the video sequence frames, and finally obtains the target video based on the M synthesized video frames.

For ease of understanding, please refer to fig. 24, where fig. 24 is a schematic flowchart of a video synthesis method according to an embodiment of the present application, and specifically as shown in the figure:

in step E1, it is first necessary to prepare selectable stage elements, i.e. to prepare near scene elements, far scene elements, floor elements, background elements, text elements, etc. for selection by the user, wherein the stage elements are represented in the form of images, and the image formats of the images may include, but are not limited to, PNG, TIFF and GIF.

In step E2, the stage type is divided into 5 parts, each part corresponds to one layer, the perspective relationship of the stage element is realized by using the top-bottom relationship between the layers, and the layer with a higher level may block the content of the layer with a lower level, for example, the layer relationship from top to bottom is a fifth layer, a first layer, a second layer, a third layer, and a fourth layer.

In step E3, the developer may select a stage element according to actual requirements, and if the terminal device obtains an element selection instruction for different stage types, determine a target stage element based on an element identifier carried by the element selection instruction.

In step E4, the target stage elements are respectively displayed on the corresponding layers according to the perspective relationship, as can be seen from the foregoing embodiments, if the target close-range element, the target distant-range element, the target floor element, the target background element, and the target text element are selected, the target close-range element may be displayed on the first layer, the target distant-range element may be displayed on the second layer, the target floor element may be displayed on the third layer, the target background element may be displayed on the fourth layer, and the target text element may be displayed on the fifth layer.

In step E5, the terminal device acquires live-action videos, and each video frame includes a target object.

In step E6, the terminal device performs transparency processing on the background area in each video frame for each video frame in the live-action video to obtain a sequence frame image, and the video sequence frame is formed by M sequence frame images.

In step E7, the sequence frame images are sequentially extracted from the video sequence frames in the order of appearance of the video frames in the live-action video. It should be understood that there is no restriction on the execution order between step E1 and step E5.

In step E8, the terminal device places the layer corresponding to the display sequence frame image in the middle of the layer corresponding to the stage close-range element and the layer corresponding to the stage long-range view.

In step E9, the terminal device renders the entire screen onto the canvas.

In step E10, it is determined whether or not all of the sequence frame images in the video sequence frame have been extracted, if yes, step E11 is performed, and if no, step E7 is performed.

In step E11, synthesizing the target text element, the target close-range element, the video sequence frame, the target distant-range element, the target floor element, and the target background element to obtain a synthesized video frame, and obtaining the target video after all the synthesized video frames are obtained.

In the embodiment of the application, the method for rendering the target video in the framing manner is provided, in the video making stage, the rendering of each frame of video eyebrows is controlled by a program respectively, the whole rendering process can be automatically realized, the time investment of art designing and development can be saved, the efficiency of making the video is improved, the time cost of making the video and later maintenance can be further saved, and the video processing efficiency is improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided by the embodiment of the present application, the method for video synthesis further includes the following steps:

the method comprises the steps that terminal equipment obtains animation sequence frames corresponding to animation videos, wherein the animation sequence frames comprise M animation sequence frame images, each animation sequence frame image comprises a foreground area and a background area, the foreground area is used for displaying an animation object, the background area is transparent, and the animation sequence frames are displayed on a target layer;

and the terminal equipment carries out synthesis processing on the video sequence frame, the animation sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain the target video.

In this embodiment, a manner of adding an animation character in the process of synthesizing a target video is described. The method comprises the steps that terminal equipment obtains animation sequence frames corresponding to animation videos, the animation sequence frames comprise M animation sequence frame images, each animation sequence frame image comprises a foreground area and a background area, the foreground area is used for displaying an animation object, the background area is transparent, and the animation sequence frames are displayed on a target layer. Based on the method, according to the layer relation between the associated layer and the target layer, the video sequence frame, the animation sequence frame and the target stage element are subjected to synthesis processing to obtain the target video. The animation sequence frame is similar to the video sequence frame in the foregoing embodiment, and therefore, the manner of processing the animation sequence frame image is similar to the manner of processing the sequence frame image described in the foregoing embodiment, and thus details are not repeated here.

For easy understanding, please refer to fig. 25, fig. 25 is a schematic diagram of another embodiment of generating a synthesized video frame in the embodiment of the present application, as shown in fig. 25, a diagram (a) is a sequence frame image in a video sequence frame, a diagram (B) is an animation sequence frame image in an animation sequence frame, a diagram (C) in fig. 25 is a target close-up view element, and a diagram (D) in fig. 25 is a target distant view element. Because the animation sequence frame is also displayed in the target layer, the animation object and the target object are both in the same layer, and the video sequence frame, the animation sequence frame, and the target stage element are synthesized through the associated layer and the layer relationship between the target layer described in the foregoing embodiment, so that the synthesized video frame illustrated in (E) in fig. 25 can be obtained, and then the target video is generated according to the M synthesized video frames.

In the embodiment of the application, a mode of adding the animation role in the process of synthesizing the target video is provided, and through the mode, the animation object can be synthesized into the target video, so that the previous interaction of the real object and the virtual object is realized, the personalized collocation of the stage is met, and the flexibility of application development is further improved.

the terminal equipment acquires audio to be played;

the terminal equipment carries out synthesis processing on the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain a video to be processed;

and the terminal equipment synthesizes the audio to be played and the video to be processed to obtain the target video.

In this embodiment, a method of adding audio in a process of synthesizing a target video is described. The terminal equipment can also acquire audio to be played, wherein the audio to be played can be background music, actor whine, special sound effect, late dubbing and the like. When the synthesis processing is performed, the video sequence frames and the target stage element may be synthesized first to obtain a to-be-processed video, where it should be noted that the to-be-processed video is a video without sound, that is, the to-be-processed video only includes a plurality of synthesized video frames and does not include audio.

Based on the method, the terminal equipment can also synthesize the audio to be played to the video to be processed, so that the target video is obtained. Specifically, each audio frame in the audio to be played may be synthesized with each frame of picture in the video to be processed, so as to obtain a synthesized video frame synchronized with sound and picture. In practical applications, other synthetic processing manners may also be adopted, and are not limited herein.

In the embodiment of the application, the method for adding the audio in the process of synthesizing the target video is provided, and by the mode, the audio can be added while the image is synthesized, so that the target video with the sound effect is obtained, the flexibility of application development is improved, and the authenticity of the target video is improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, after the terminal device performs synthesis processing on the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain the target video, the method for synthesizing the video further includes the following steps:

when the terminal equipment detects the playing operation aiming at the target video, playing the target video;

when the target video is playing, if the terminal equipment detects a pause operation aiming at the target video, pausing to play the target video;

when the target video is paused, if the terminal equipment detects the playing operation aiming at the target video, the target video is continuously played;

when the target video is being played, if the terminal device detects a skip operation for the target video, the playing of the target video is finished.

In this embodiment, a method for browsing a manufactured target video is described. After the terminal device obtains the target video, the developer may browse the target video, for example, when the terminal device detects a play operation for the target video, the target video may be played. When the terminal equipment plays the target video, if the developer selects to pause the playing of the target video, the terminal equipment can pause the playing of the target video when the terminal equipment detects the pause operation aiming at the target video. When the terminal equipment plays the target video, if the developer selects to directly skip the target video, the terminal equipment can skip the target video when detecting the skipping operation aiming at the target video, thereby finishing the playing of the target video. Under the condition that the terminal equipment pauses playing the target video, if the developer wants to play the target video again, the terminal equipment can continue playing the target video when detecting the playing operation aiming at the target video.

For easy understanding, referring to fig. 26, fig. 26 is a schematic diagram illustrating an embodiment of browsing a target video in the embodiment of the present application, as shown in the drawing, F1 is used for indicating a video playing interface, F2 is used for indicating a pause interface, F3 is used for indicating a skip interface, and F4 is used for indicating a playing interface. Fig. 26 (a) shows the generated target video, and if the video playback interface is operated, a video playback instruction is generated, whereby playback of the target video is started. Fig. 26 (B) shows a target video being played, and if the pause interface is operated, a video pause instruction is generated, thereby pausing the playing of the target video. If the skip interface is operated, a video skip instruction is generated, thereby skipping the play target video. Fig. 26 (C) shows that there is a target video being played, and if the playing interface is operated, a video resume instruction is generated, so that the target video can be continuously played.

In the embodiment of the application, a method for browsing a manufactured target video is provided, and different operations are performed on the target video according to user requirements through the mode, so that the target video can be played, paused and skipped, the interaction effect is increased, and developers can test the playing condition of the target video conveniently.

Referring to fig. 27, fig. 27 is a schematic diagram of an embodiment of a video compositing apparatus according to the present application, and as shown in the drawing, the video compositing apparatus 20 includes:

an obtaining module 201, configured to obtain an element selection instruction for a target stage type, where the element selection instruction carries an element identifier;

a determining module 202, configured to determine, in response to an element selection instruction, a target stage element from selectable stage elements corresponding to a target stage type, where the target stage element is shown in an associated layer;

the acquiring module 201 is further configured to acquire a video sequence frame corresponding to a live-action shooting video, where the video sequence frame includes M sequence frame images, each sequence frame image includes a foreground region and a background region, the foreground region is used for displaying a target object, the background region is transparent, the video sequence frame is displayed on a target layer, and M is an integer greater than 1;

the processing module 203 is configured to perform synthesis processing on the video sequence frames and the target stage elements according to the layer relation between the associated layer and the target layer to obtain a target video, where the target video includes M synthesized video frames.

In the embodiment of the application, a video synthesis device is provided, and by adopting the device, except that actors adopt videos shot in real scenes, other stage elements are dynamically synthesized in real time through programs. Because the selectable stage elements are all configured in advance, developers only need to provide a set of actor live-action videos according to different scene requirements, on one hand, the performance forms of actors can be enriched, personalized collocation of the stage is met, flexibility of application and development is improved, on the other hand, time investment of the artists and development can be saved no matter in the manufacturing stage or in the maintenance stage, and time cost of video manufacturing and later maintenance is saved.

Alternatively, on the basis of the embodiment corresponding to fig. 27, in another embodiment of the video compositing apparatus 20 provided in the embodiment of the present application,

the acquiring module 201 is specifically configured to acquire a live-action video, where the live-action video includes M video frames, and each video frame includes a target object;

In the embodiment of the application, a video synthesis device is provided, and by adopting the device, the background area in each video frame in the live-action shooting video is subjected to transparentization processing, so that only a target object is displayed in the foreground area in the video sequence frame, the influence of the background area on the video sequence frame is removed, and the subsequent video processing is facilitated.

Optionally, on the basis of the embodiment corresponding to fig. 27, in another embodiment of the video compositing apparatus 20 provided in the embodiment of the present application, the target stage type includes a stage close-up type;

an obtaining module 201, specifically configured to provide a stage editing interface;

the determining module 202 is specifically configured to determine, according to an element selection instruction for a stage close-range type, a target close-range element from selectable stage elements corresponding to the stage close-range type, where the target close-range element is displayed in a first layer, and the first layer belongs to an associated layer;

the processing module 203 is specifically configured to cover the first layer on the target layer, and perform synthesis processing on the video sequence frame and the target close-range element to obtain a target video.

In the embodiment of the application, a video synthesis device is provided, and by adopting the device, a first layer for displaying the target close-range elements is covered on a target layer for displaying the target object, so that the target video can also display the selected target close-range elements, personalized collocation of different requirements is met, and flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 27, in another embodiment of the video compositing apparatus 20 provided in the embodiment of the present application, the target stage type includes a stage perspective type;

the determining module 202 is specifically configured to determine, according to an element selection instruction for a stage perspective type, a target perspective element from selectable stage elements corresponding to the stage perspective type, where the target perspective element is displayed in a second layer, and the second layer belongs to an associated layer;

the processing module 203 is specifically configured to cover the target layer on the second layer, and perform synthesis processing on the video sequence frame and the target distant view element to obtain a target video.

In the embodiment of the application, a video synthesis device is provided, and by adopting the device, a target layer for displaying a target object is covered on a second layer for displaying a target distant view element, so that the target video can also display the selected target distant view element, personalized collocation of different requirements is met, and the flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 27, in another embodiment of the video compositing apparatus 20 provided in the embodiment of the present application, the target stage type includes a stage floor type;

the determining module 202 is specifically configured to determine, according to an element selection instruction for a stage floor type, a target floor element from selectable stage elements corresponding to the stage floor type, where the target floor element is shown in a third layer, and the third layer belongs to an associated layer;

the processing module 203 is specifically configured to cover the target layer on the third layer, and perform synthesis processing on the video sequence frame and the target floor element to obtain a target video.

In the embodiment of the application, the video synthesis device is provided, and by adopting the device, the target layer for displaying the target object is covered on the third layer for displaying the target floor element, so that the target video can also display the selected target floor element, personalized collocation of different requirements is met, and the flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 27, in another embodiment of the video compositing apparatus 20 provided in the embodiment of the present application, the target stage type includes a stage background type;

the determining module 202 is specifically configured to determine, according to an element selection instruction for a stage background type, a target background element from selectable stage elements corresponding to the stage background type, where the target background element is shown in a fourth layer, and the fourth layer belongs to an associated layer;

the processing module 203 is specifically configured to cover the target layer on the fourth layer, and perform synthesis processing on the video sequence frame and the target background element to obtain a target video.

In the embodiment of the application, a video synthesis device is provided, and by using the device, a target layer for displaying a target object is covered on a fourth layer for displaying a target background element, so that the target video can also display the selected target background element, personalized collocation of different requirements is met, and flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 27, in another embodiment of the video synthesis apparatus 20 provided in the embodiment of the present application, the target stage type includes a stage text type;

a determining module 202, configured to determine, according to an element selection instruction for a stage text type, a target text font from selectable stage elements corresponding to the stage text type;

the processing module 203 is specifically configured to cover the fifth layer on the target layer, and perform synthesis processing on the video sequence frame and the target text element to obtain a target video.

In the embodiment of the application, a video synthesis device is provided, and by adopting the device, the fifth layer for displaying the target text element is covered on the target layer for displaying the target object, so that the target video can also display the selected target text element, personalized collocation of different requirements is met, and the flexibility of application development is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 27, in another embodiment of the video synthesis apparatus 20 provided in the embodiment of the present application, the target stage type includes a stage close shot type, a stage distant shot type, a stage floor type, a stage background type, and a stage text type;

the determining module 202 is specifically configured to determine, according to an element selection instruction for a stage close-range type, a target close-range element from selectable stage elements corresponding to the stage close-range type, where the target close-range element is displayed in a first layer;

the processing module 203 is specifically configured to sequentially superimpose the fifth layer, the first layer, the target layer, the second layer, the third layer, and the fourth layer according to a top-to-bottom order, and perform synthesis processing on a target text element, a target close-range element, a video sequence frame, a target far-range element, a target floor element, and a target background element to obtain a target video.

In the embodiment of the application, a video synthesis device is provided, and by adopting the device, a fifth layer, a first layer, a target layer, a second layer, a third layer and a fourth layer are sequentially overlaid according to a sequence from top to bottom, so that a target video can also display a selected target text element, a target close-range element, a video sequence frame, a target distant-range element, a target floor element and a target background element, personalized collocation of different requirements is met, and flexibility of application development is further improved.

the processing module 203 is specifically configured to, for each sequence frame image in the video sequence frames, render the sequence frame image and the target stage element onto a canvas according to the layer relationship between the associated layer and the target layer;

a target video is generated from the M synthesized video frames.

In the embodiment of the application, a video synthesizer is provided, adopt above-mentioned device, in the video preparation stage, by the rendering of program control to every frame video eyebrow respectively, but whole rendering process is automatic to be realized, not only can save the time input of art designer and development, promotes the efficiency of preparation video, can also further practice thrift the time cost of preparation video and later maintenance, promotes video processing efficiency.

the obtaining module 201 is further configured to obtain an animation sequence frame corresponding to an animation video, where the animation sequence frame includes M animation sequence frame images, each animation sequence frame image includes a foreground region and a background region, the foreground region is used for displaying an animation object, the background region is transparent, and the animation sequence frame is displayed in a target layer;

the processing module 203 is specifically configured to perform synthesis processing on the video sequence frame, the animation sequence frame, and the target stage element according to the layer relation between the associated layer and the target layer, so as to obtain a target video.

In the embodiment of the application, the video synthesis device is provided, and by adopting the device, the animation object can be synthesized into the target video, so that the previous interaction between the real object and the virtual object is realized, the personalized collocation of the stage is met, and the flexibility of application development is further improved.

the obtaining module 201 is further configured to obtain an audio to be played;

the processing module 203 is specifically configured to perform synthesis processing on the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain a to-be-processed video;

In the embodiment of the application, a video synthesis device is provided, and by adopting the device, audio can be added while synthesizing images, so that target video with sound effect is obtained, the flexibility of application development is improved, and the authenticity of the target video is improved.

Optionally, on the basis of the embodiment corresponding to fig. 27, in another embodiment of the video compositing device 20 provided in this application embodiment, the video compositing device 20 further includes a playing module 204,

the playing module 204 is configured to perform synthesis processing on the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain a target video, and play the target video when a playing operation for the target video is detected;

the playing module 204 is further configured to, when the target video is being played, pause playing the target video if a pause operation for the target video is detected;

the playing module 204 is further configured to, when the target video is paused, if a playing operation for the target video is detected, continue to play the target video;

the playing module 204 is further configured to, when the target video is being played, end playing the target video if a skip operation for the target video is detected.

In the embodiment of the application, the video synthesis device is provided, and by adopting the device, different operations can be performed on the target video according to user requirements, so that the target video can be played, paused and skipped, the interactive effect is increased, and developers can test the playing condition of the target video conveniently.

An embodiment of the present application further provides another video composition apparatus, where the video composition apparatus may be deployed in a terminal device, as shown in fig. 28, fig. 28 is a schematic view of an embodiment of the terminal device in the embodiment of the present application, and for convenience of description, only a portion related to the embodiment of the present application is shown, and details of a specific technology are not disclosed, please refer to a method portion in the embodiment of the present application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a Point of Sales (POS), a vehicle-mounted computer, and the like, taking the terminal device as the mobile phone as an example:

fig. 28 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to fig. 28, the cellular phone includes: radio Frequency (RF) circuit 310, memory 320, input unit 330, display unit 340, sensor 350, audio circuit 360, wireless fidelity (WiFi) module 370, processor 380, and power supply 390. Those skilled in the art will appreciate that the handset configuration shown in fig. 28 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 28:

the RF circuit 310 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 380; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuit 310 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuit 310 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.

The memory 320 may be used to store software programs and modules, and the processor 380 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 320. The memory 320 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 320 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 330 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 330 may include a touch panel 331 and other input devices 332. The touch panel 331, also referred to as a touch screen, can collect touch operations of a user (e.g., operations of the user on the touch panel 331 or near the touch panel 331 using any suitable object or accessory such as a finger, a stylus, etc.) on or near the touch panel 331, and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 331 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 380, and can receive and execute commands sent by the processor 380. In addition, the touch panel 331 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 330 may include other input devices 332 in addition to the touch panel 331. In particular, other input devices 332 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 340 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The Display unit 340 may include a Display panel 341, and optionally, the Display panel 341 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 331 can cover the display panel 341, and when the touch panel 331 detects a touch operation on or near the touch panel 331, the touch panel is transmitted to the processor 380 to determine the type of the touch event, and then the processor 380 provides a corresponding visual output on the display panel 341 according to the type of the touch event. Although in fig. 28, the touch panel 331 and the display panel 341 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 331 and the display panel 341 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 350, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 341 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 341 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 360, speaker 361, microphone 362 may provide an audio interface between the user and the handset. The audio circuit 360 may transmit the electrical signal converted from the received audio data to the speaker 361, and the audio signal is converted by the speaker 361 and output; on the other hand, the microphone 362 converts the collected sound signals into electrical signals, which are received by the audio circuit 360 and converted into audio data, which are then processed by the audio data output processor 380 and then transmitted to, for example, another cellular phone via the RF circuit 310, or output to the memory 320 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 370, and provides wireless broadband internet access for the user. Although fig. 28 shows WiFi module 370, it is understood that it does not belong to the essential components of the handset.

The processor 380 is a control center of the mobile phone, connects various parts of the whole mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory 320, thereby performing overall monitoring of the mobile phone. Optionally, processor 380 may include one or more processing units; preferably, the processor 380 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 380.

The handset also includes a power supply 390 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 380 via a power management system to manage charging, discharging, and power consumption via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, and the like, and thus, the detailed description is omitted here.

In this embodiment, the processor 380 included in the terminal device may execute the functions of the terminal device in the embodiments shown in fig. 3 to fig. 25, which are not described herein again.

Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method described in the foregoing embodiments.

Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and therefore, the detailed description is not repeated here.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for video compositing, comprising:

in response to the element selection instruction, determining a target stage element from selectable stage elements corresponding to the target stage type, wherein the target stage element is shown in an associated layer;

and synthesizing the video sequence frames and the target stage elements according to the layer relation between the associated layers and the target layers to obtain a target video, wherein the target video comprises M synthesized video frames.

2. The method according to claim 1, wherein the obtaining of the video sequence frames corresponding to the live-action video comprises:

acquiring the live-action shot video, wherein the live-action shot video comprises M video frames, and each video frame comprises the target object;

and generating the video sequence frame according to the sequence frame image corresponding to each video frame.

3. The method according to claim 1, wherein the target stage type comprises a stage close-up type;

the obtaining of the element selection instruction for the target stage type includes:

providing a stage editing interface;

acquiring an element selection instruction aiming at the stage close shot type through the stage editing interface;

the determining, in response to the element selection instruction, a target stage element from the selectable stage elements corresponding to the target stage type includes:

according to the element selection instruction aiming at the stage close shot type, determining a target close shot element from selectable stage elements corresponding to the stage close shot type, wherein the target close shot element is shown on a first layer, and the first layer belongs to the associated layer;

the synthesizing the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain the target video comprises the following steps:

covering the first layer on the target layer, and performing synthesis processing on the video sequence frame and the target close-range element to obtain the target video.

4. The method according to claim 1, wherein the target stage type comprises a stage vision type; the obtaining of the element selection instruction for the target stage type includes:

providing a stage editing interface;

acquiring an element selection instruction aiming at the stage distant view type through the stage editing interface;

according to the element selection instruction aiming at the stage distant view type, determining a target distant view element from selectable stage elements corresponding to the stage distant view type, wherein the target distant view element is displayed on a second layer, and the second layer belongs to the associated layer;

covering the target layer on the second layer, and performing synthesis processing on the video sequence frame and the target distant view element to obtain the target video.

5. The method according to claim 1, wherein the target stage type comprises a stage floor type;

providing a stage editing interface;

acquiring an element selection instruction aiming at the type of the stage floor through the stage editing interface;

according to the element selection instruction aiming at the stage floor type, determining a target floor element from selectable stage elements corresponding to the stage floor type, wherein the target floor element is shown in a third layer, and the third layer belongs to the associated layer;

covering the target layer on the third layer, and performing synthesis processing on the video sequence frame and the target floor element to obtain the target video.

6. The method according to claim 1, wherein the target stage type comprises a stage background type;

providing a stage editing interface;

acquiring an element selection instruction aiming at the stage background type through the stage editing interface;

according to the element selection instruction aiming at the stage background type, determining a target background element from selectable stage elements corresponding to the stage background type, wherein the target background element is shown in a fourth layer, and the fourth layer belongs to the associated layer;

covering the target layer on the fourth layer, and performing synthesis processing on the video sequence frame and the target background element to obtain the target video.

7. The method of claim 1, wherein the target stage type comprises a stage text type;

providing a stage editing interface;

acquiring an element selection instruction aiming at the stage text type through the stage editing interface;

according to the element selection instruction aiming at the stage text type, determining a target text font from selectable stage elements corresponding to the stage text type;

acquiring a target text element corresponding to the target text font, wherein the target text element is displayed on a fifth layer, and the fifth layer belongs to the associated layer;

covering the fifth layer on the target layer, and performing synthesis processing on the video sequence frame and the target text element to obtain the target video.

8. The method according to claim 1, wherein the target stage type comprises a stage close shot type, a stage distant shot type, a stage floor type, a stage background type, and a stage text type;

providing a stage editing interface;

according to the element selection instruction aiming at the stage close shot type, determining a target close shot element from selectable stage elements corresponding to the stage close shot type, wherein the target close shot element is shown in a first layer;

according to the element selection instruction aiming at the stage perspective type, determining a target perspective element from selectable stage elements corresponding to the stage perspective type, wherein the target perspective element is displayed on a second layer;

according to the element selection instruction aiming at the stage floor type, determining a target floor element from selectable stage elements corresponding to the stage floor type, wherein the target floor element is shown in a third layer;

according to the element selection instruction aiming at the stage background type, determining a target background element from selectable stage elements corresponding to the stage background type, wherein the target background element is shown in a fourth layer;

acquiring a target text element corresponding to the target text font, wherein the target text element is displayed on a fifth layer, and the first layer, the second layer, the third layer, the fourth layer and the fifth layer all belong to the associated layer;

and sequentially superposing the fifth layer, the first layer, the target layer, the second layer, the third layer and the fourth layer according to a top-to-bottom sequence, and synthesizing the target text element, the target close-range element, the video sequence frame, the target distant-range element, the target floor element and the target background element to obtain the target video.

9. The method according to claim 1, wherein the synthesizing the video sequence frames and the target stage element according to the layer relationship between the associated layer and the target layer to obtain a target video comprises:

for each sequence frame image in the video sequence frame, rendering the sequence frame image and the target stage element onto a canvas according to the layer relation between the associated layer and the target layer;

for each sequence frame image in the video sequence frames, displaying a synthesized video frame corresponding to the sequence frame image through the canvas;

generating the target video from the M synthesized video frames.

10. The method of claim 1, further comprising: acquiring animation sequence frames corresponding to an animation video, wherein the animation sequence frames comprise M animation sequence frame images, each animation sequence frame image comprises a foreground area and a background area, the foreground area is used for displaying an animation object, the background area is transparent, and the animation sequence frames are displayed on the target layer;

and synthesizing the video sequence frame, the animation sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain the target video.

11. The method of claim 1, further comprising:

acquiring audio to be played;

synthesizing the video sequence frame and the target stage element according to the layer relation between the associated layer and the target layer to obtain a video to be processed;

12. The method according to any one of claims 1 to 11, wherein after the synthesizing of the video sequence frames and the target stage element according to the layer relationship between the associated layer and the target layer to obtain a target video, the method further comprises:

when a playing operation for the target video is detected, playing the target video;

when the target video is playing, if the pause operation aiming at the target video is detected, pausing the playing of the target video;

when the target video is paused to be played, if the playing operation aiming at the target video is detected, the target video is continuously played;

when the target video is being played, if the skip operation aiming at the target video is detected, the target video is finished being played.

13. A video compositing apparatus, comprising:

a determining module, configured to determine, in response to the element selection instruction, a target stage element from selectable stage elements corresponding to the target stage type, where the target stage element is shown in an associated layer;

the acquiring module is further configured to acquire a video sequence frame corresponding to a live-action shooting video, where the video sequence frame includes M sequence frame images, each sequence frame image includes a foreground region and a background region, the foreground region is used for displaying a target object, the background region is transparent, the video sequence frame is displayed in a target layer, and M is an integer greater than 1;

and the processing module is used for synthesizing the video sequence frames and the target stage elements according to the layer relation between the associated layer and the target layer to obtain a target video, wherein the target video comprises M synthesized video frames.

14. A terminal device, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor for executing a program in the memory, the processor for performing the method of any one of claims 1 to 12 according to instructions in the program code;

15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12.