WO2019086037A1

WO2019086037A1 - Video material processing method, video synthesis method, terminal device and storage medium

Info

Publication number: WO2019086037A1
Application number: PCT/CN2018/114100
Authority: WO
Inventors: 张涛; 董霙
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2017-11-06
Filing date: 2018-11-06
Publication date: 2019-05-09
Also published as: CN107770626B; CN107770626A

Abstract

Disclosed in the present application are a video material processing method, a video synthesis method, a terminal device and a storage medium. The video material processing method, performed by the terminal device, comprises: acquiring a material set of videos to be synthesized, and determining attributes of the material set, wherein the material set comprises multiple material elements, each material element comprises at least one type of multimedia content from pictures, characters, audio and video, and the attributes comprise the playback order and the playback duration of each material element in the material set; determining effect parameters corresponding to the material set, the effect parameters corresponding to video effect modes; and transmitting the material set and the effect parameters to a video synthesis server to enable the video synthesis server to synthesize the multiple material elements in the material set into a video corresponding to the video effect modes on the basis of the effect parameters and the attributes of the material set.

Description

Video material processing method, video synthesis method, terminal device and storage medium

This application claims the priority of the Chinese Patent Application filed on November 6, 2017, the Chinese Patent Office, the application number is 201711076478.2, and the invention is entitled "Processing method of video material, video synthesis method, device and storage medium". This is incorporated herein by reference.

Technical field

The present application relates to the field of video synthesis, and in particular, to a video material processing method, a video synthesis method, a terminal device, and a storage medium.

Background technique

With the development of multimedia technology, video production has been widely used in people's lives. Video production is the recombination of pictures, videos, audio and other materials to generate video. Currently, video production typically requires the installation of video production software on a personal computing device. These video production software can provide feature-rich video editing features, but the operation is complicated.

Summary of the invention

To this end, the present application proposes a new video synthesis scheme to solve the problem of how to improve the operational convenience of video synthesis.

According to an aspect of the present application, a method for processing a video material is provided, which is performed by a terminal device, the method comprising: acquiring a material set of a video to be synthesized, and determining an attribute of the material set, wherein the material set includes multiple a material element, each material element includes at least one of media content in a picture, a text, an audio, and a video, the attribute includes a play order and a play duration of each material element in the material set; determining an effect parameter corresponding to the material set, and an effect The parameter corresponds to a video effect mode; the material set and the effect parameter are transmitted to the video composition server, so that the video composition server synthesizes the plurality of material elements in the material set into corresponding according to the effect parameter and the attribute of the material set Video in video effect mode.

According to an aspect of the present application, a video synthesizing method is provided, which is executed by a server, and the method includes: acquiring, from a material processing application, a material set of a video to be synthesized and an effect parameter about the material set, wherein the material set includes multiple a material element, each material element includes at least one of media content in a picture, a text, an audio, and a video, and the attribute of the material collection includes a play order and a play duration of each material element in the material set, and the effect parameter corresponds to a video Effect mode; combines multiple material elements in a material collection into a video in video effect mode based on the effect parameters and the properties of the material collection.

According to an aspect of the present application, there is provided a terminal device comprising: a processor and a memory; wherein the memory stores computer readable instructions that enable the processor to perform a processing method of the video material according to the present application.

According to an aspect of the present application, a server is provided, comprising: a processor and a memory; the memory storing computer readable instructions that enable the processor to perform a video synthesis method according to the present application.

According to an aspect of the present application, there is provided a non-volatile storage medium storing a data processing program that, when executed by a computing device, causes the computing device to perform a processing method or a video synthesis method of a video material.

In summary, according to the processing scheme of the video material of the present application, content selection can be performed in a user interface (for example, the user interface of FIGS. 3A to 3G), so that the material collection of the video to be synthesized can be conveniently obtained. In particular, the processing scheme of the present application can also automatically clip the video to generate a video segment and corresponding description information, thereby enabling the user to quickly determine the content of the video segment and perform segment selection by viewing the description information. In addition, the processing scheme of the present application can prevent the user from performing complicated operations related to the video effect on the local terminal device, and can intuitively present the preview effect maps (eg, effect animations, etc.) of the plurality of video effect modes to the user, thereby facilitating the user. Quickly determine the effect mode of the video to be composited. On this basis, the processing scheme of the present application can synthesize video through the video synthesis server, thereby greatly improving the user experience.

DRAWINGS

In order to more clearly illustrate the technical solutions in the examples of the present application, the drawings used in the description of the examples will be briefly described below. Obviously, the drawings in the following description are only some examples of the present application, For ordinary technicians, other drawings can be obtained based on these drawings without paying for creative labor.

FIG. 1 shows a schematic diagram of an application scenario 100 in accordance with some embodiments of the present application;

2A shows a flowchart of a method 200 of processing video material in accordance with some embodiments of the present application;

2B shows a flowchart of obtaining a collection of materials in accordance with some embodiments of the present application;

FIG. 3A illustrates a schematic diagram of a user interface for acquiring picture content, according to some embodiments of the present application; FIG.

FIG. 3B illustrates an interface diagram of a display picture of some embodiments; FIG.

3C shows a schematic diagram of acquiring audio information in accordance with some embodiments of the present application;

3D illustrates a user interface for generating video segments in accordance with some embodiments of the present application;

Figure 3E shows an editing interface of a video clip;

3F illustrates a user interface for adjusting a play order, in accordance with some embodiments of the present application;

3G illustrates a user interface for determining an effect parameter, in accordance with some embodiments of the present application;

FIG. 4 illustrates a flow diagram of a video composition method 400 in accordance with some embodiments of the present application;

FIG. 5 illustrates a video rendering process in accordance with some embodiments of the present application;

FIG. 6 illustrates a flow diagram of a video composition method 600 in accordance with some embodiments of the present application;

FIG. 7 shows a schematic diagram of a processing device 700 for video material in accordance with some embodiments of the present application;

FIG. 8 shows a schematic diagram of a video synthesizing apparatus 800 in accordance with some embodiments of the present application;

FIG. 9 shows a schematic diagram of a video synthesizing device 900 in accordance with some embodiments of the present application; and

Figure 10 shows a block diagram of the composition of a computing device.

Detailed ways

The technical solutions in the examples of the present application are clearly and completely described in the following with reference to the accompanying drawings in the present application. It is obvious that the described examples are only a part of the examples of the present application, and not all examples. All other examples obtained by a person of ordinary skill in the art based on the examples in the present application without creative efforts are within the scope of the present application.

FIG. 1 shows a schematic diagram of an application scenario 100 in accordance with some embodiments of the present application. As shown in FIG. 1, the application scenario 100 includes a terminal device 110 and a server 120. Here, the terminal device 110 may be, for example, various devices such as a desktop computer, a notebook computer, a tablet computer, a mobile phone, or a handheld game console, but is not limited thereto. Server 120 may include one or more hardware independent servers. The server 120 may also be a device resource such as a virtual server or a distributed cluster, but is not limited thereto. The terminal device 110 can include various applications, such as a material processing application 111. The material processing application 111 can acquire the video material of the video to be composited and transmit the video material to the server 120. In this way, the server 120 can synthesize the corresponding video based on the received video material. The server 120 can also transmit the synthesized video to the terminal device 110. Here, the material processing application 111 may be, for example, a client or a browser dedicated to managing the material, and the like, which is not limited in this application. In some embodiments, server 120 can include a video composition application (not shown in FIG. 1). The video composition application may be, for example, software that synthesizes a video using a collection of materials, or may be a component of various multimedia applications. Here, the multimedia application is, for example, software that provides video content to the terminal device 110. The processing method of the video material will be described below with reference to FIG.

2A shows a flow diagram of a method 200 of processing video material in accordance with some embodiments of the present application. The processing method 200 of the video material may be performed by, for example, the material processing application 111, but is not limited thereto. Here, the material processing application 111 may be, for example, a browser for processing a material or a client for processing a material. In addition, the material processing application 111 can also be a component of an application such as an instant messaging application (QQ, WeChat, etc.), a social networking application, a video application (such as Tencent video, etc.) or a news client.

As shown in FIG. 2A, the processing method 200 of the video material includes step S201, acquiring a material set of the video to be synthesized, and determining attributes of the material set. Here, the material collection may include a plurality of material elements. Each material element includes at least one of media content in images, text, audio, and video. The properties of the clip collection include the play order and play duration of each clip in the clip. According to some embodiments of the present application, in step S201, a user interface for acquiring material elements is provided. The user interface can include at least one control corresponding to at least one media type, respectively. Here, a control is a view object in the user interface for interacting with a user, such as an input box, a drop-down selection box, a button, and the like. The range of media types includes, for example, text, pictures, audio, and video, but is not limited thereto. Based on this, step S201 may obtain media content corresponding to the media type of the control in response to operation of any of the controls in the user interface, and use it as a media content of a material element in the material collection.

In some embodiments, when the user selects a picture stored locally or from the network through a picture control, step S201 may respond to the operation of the picture control as the picture content of a material element. It is also noted that the material elements containing the picture content may also typically include text or audio associated with the picture. In some embodiments, when the user inputs text corresponding to the picture through a text input control, step S201 obtains text information associated with the picture content in response to the operation of the text input control, and uses the same as the corresponding material element. Text content. In some embodiments, step S201 may, in response to an operation of the audio control, acquire audio information associated with the picture content as the audio content of the corresponding material element. Here, the audio content is, for example, narration or background music or the like. In addition, step S201 may use the playing duration of the picture as the playing duration of the corresponding material element. In order to more vividly explain the execution process of step S201, an exemplary description will be made below with reference to FIGS. 3A to 3C.

FIG. 3A illustrates a schematic diagram of a user interface for acquiring picture content, in accordance with some embodiments of the present application. FIG. 3B shows an interface diagram of a display picture of some embodiments. As shown in FIGS. 3A and 3B, when the user operates the control 301, step S201 can acquire a picture and display it in the preview window 302. Step S201 may determine the play duration of the picture in response to the operation of the play duration control 303. Step S201 may acquire text information related to the picture in the preview window 302 in response to the operation of the text input control 304. In other words, text information is a supplement to the picture. FIG. 3C illustrates a schematic diagram of acquiring audio information in accordance with some embodiments of the present application. For example, step S201 can obtain locally stored audio (eg, for background music) in response to operation of control 305. For another example, step S201 can record a piece of audio content in response to operation of control 306. The audio content is, for example, a narration recorded for the picture in the preview window 302.

In some embodiments, step S201 may acquire a video as a video content of a material element. For example, step S201, in response to an operation on a video control in the user interface, acquires a video clip as the video content of a material element. Here, the video may be, for example, a video file stored locally or a video content stored in the cloud. For the material element containing the video content, step S201 may also add text content, audio content, and the like thereto. When a material element includes video content, step S201 may use the playing duration of the video content as the playing duration of the material element.

In some embodiments, step S201 may include steps S2011-S2014. As shown in FIG. 2B, in step S2011, a video is acquired. In step S2012, at least one video segment is extracted from the segment of video according to a predetermined video editing algorithm and description information of each video segment is generated. In particular, in accordance with some embodiments of the present application, step S2012 first determines at least one key image frame of the video segment. For each key image frame, step S2012 may extract a video segment containing the key image frame from the segment of video. The video clip can include an audio clip associated with a sequence of image frames of the video clip. In this way, step S2012 performs character recognition on the audio segment to acquire the corresponding text, and generates description information corresponding to the video segment according to the text. It should be understood that various algorithms capable of automatically editing video can be adopted in step S2012, which is not limited in this application.

Based on this, in step S2013, a user interface for displaying description information of each video segment is provided, so that the user performs segment selection according to the description information of each video segment.

In step S2014, in response to the selection operation of the at least one video segment, each of the selected video segments is respectively used as the video content of one of the material elements in the material collection. In other words, step S2014 can generate each of the selected video segments as one material element.

In addition, it is not limited to the video editing by step S2012. The embodiment of the present application may also send a video clip request to the cloud, and the video clip is performed by a cloud device (for example, the server 120). Based on this, an embodiment of the present application can acquire a clipped video clip from a cloud device. In addition, in order to more clarify the process of generating a material element containing video content, an exemplary explanation will be made below with reference to Figs. 3D and 3E.

FIG. 3D illustrates a user interface for generating video segments in accordance with some embodiments of the present application. As shown in FIG. 3D, window 307 is a preview window of the video to be clipped. In response to operation of control 308, embodiments of the present application may generate multiple video segments, such as segment 309. Figure 3E shows the editing interface of a video clip. For example, in response to an operation on segment 309 (e.g., for a click or double tap, etc.), the interface shown in Figure 3E is entered. The window 310 is a preview window of the segment 309. The area 311 is descriptive information about the segment 309. In addition, the user can input text content corresponding to the video clip through the text input control 312. The user can also obtain audio content for the video clip through control 313 or control 314. For example, icon 315 represents an acquired audio file. Additionally, by operating the checkbox in Figure 3D, the user can select at least one video clip. Thus, the present embodiment can treat each selected video clip and the corresponding text content and audio content as one material element.

In summary, step S201 can acquire a plurality of material elements. Here, step S201 may use the generation order of the plurality of material elements as the default playback order. In addition, step S201 may also adjust the play order of the plurality of material elements in response to the user operation. For example, Figure 3F illustrates a user interface that adjusts the playback order in accordance with some embodiments of the present application. FIG. 3F presents a thumbnail corresponding to each material element. For example 316 and 317. The thumbnails are arranged in order within the display area. Step S201 may adjust an arrangement order of each element in the material set in response to a movement operation of the thumbnail in the user interface, and use the adjusted arrangement order as a play order of the material set.

For the set of materials determined in step S201, the method 200 may perform step S202. In step S202, an effect parameter corresponding to the material set is determined. Here, each effect parameter corresponds to a video effect mode. Video effects include, for example, transition effects and particle effects between adjacent material elements. Among them, the transition effect refers to the scene over-effect between two scenes (ie, two material elements). For example, embodiments of the present application may employ predetermined techniques (eg, wipe, overlay, page curl, etc.) to achieve a smooth transitional effect. The transition effect can also include the effect of the picture entering the picture (also known as the picture fly-in effect). Particle effects are animated effects that simulate objects such as water, fire, fog, and gas in reality. In addition, a video effect mode corresponds to the overall effect of a video to be synthesized. In fact, one video effect mode can be a predetermined video effect or a combination of multiple predetermined video effects. In order to avoid a user performing complicated operations on the video effect mode in the terminal device 110, step S202 may provide a user interface including a plurality of effect options. Among them, each effect option corresponds to an effect parameter. Here, an effect parameter can be considered as an identifier corresponding to a video effect mode. In response to the preview operation on any of the plurality of effect options, step S202 may display a corresponding preview effect map in the user interface. In response to the selected operation of any one of the plurality of effect options, step S202 may use the effect parameter corresponding to the selected effect option as the effect parameter corresponding to the material set. For example, Figure 3G illustrates a user interface for determining performance parameters in accordance with some embodiments of the present application. As shown in FIG. 3G, region 319 shows a number of effect options, such as 320 and 321 . Each option corresponds to a video effect mode. For example, when the effect option 320 is previewed, the corresponding effect animation is displayed in the window 318. The option in window 322 is the effect option currently being previewed. Here, the effect animation can intuitively represent a video effect mode. In this way, the user can select a video effect mode by viewing the effect animation without performing complicated operations related to the video effect in the terminal device. For example, step S202 may select an effect parameter corresponding to the effect option currently being previewed in response to the operation of the control 323.

After determining the material set and effect parameters, method 200 can perform step S203. In step S203, the material set and effect parameters are transmitted to a video composition server (eg, server 120). In this way, the video composition server can synthesize multiple material elements in the material collection into videos corresponding to the determined video effect mode according to the effect parameters and the attributes of the material collection. According to some embodiments, in step S203, a video composition request is sent to the video composition server. The video composition request may include a material collection and an effect parameter. In this way, the video composition server can synthesize the material into a video in response to the video composition request. According to some embodiments of the present application, the video composition server may send hint information about providing a video composition service to the material processing application 111. In step S203, in response to receiving the prompt information, the material set and the effect parameter are transmitted to the video composition server, so that the video composition server can synthesize the corresponding video according to the received material set and the effect parameter.

In summary, according to the processing method 200 of the video material of the present application, content selection can be performed in a user interface (for example, the user interface of FIGS. 3A to 3G), so that the material set of the video to be synthesized can be conveniently obtained. In particular, the processing method 200 of the video material can also automatically clip the video to generate a video segment and corresponding description information, thereby enabling the user to quickly determine the content of the video segment and perform segment selection by viewing the description information. In addition, the processing method 200 of the video material can prevent the user from performing complicated operations related to the video effect on the local terminal device, and can intuitively present the preview effect image (for example, effect animation, etc.) of the plurality of video effect modes to the user. Thereby, it is convenient for the user to quickly determine the effect mode of the video to be synthesized. Based on this, the method 200 of the present application can synthesize video through the video synthesis server, thereby greatly improving the user experience.

The video synthesis mode will be further described below with reference to FIG. FIG. 4 illustrates a flow diagram of a video composition method 400 in accordance with some embodiments of the present application. The video synthesis method 400 can be performed by a video synthesis application. Server 120 can include a video synthesis application. The video composition application may be, for example, software that synthesizes a video using a collection of materials, or may be a component of various multimedia applications. Here, the multimedia application is, for example, software that provides video content to the terminal device 110.

As shown in FIG. 4, in step S401, a material set of a video to be synthesized and an effect parameter about the material set are acquired from the material processing application 111, wherein the material set includes a plurality of material elements, each material element includes a picture, At least one of text, audio, and video. The properties of the clip collection include the play order and play duration of each clip in the clip. The effect parameter corresponds to a video effect mode.

In step S402, a plurality of material elements in the material set are synthesized into a video of a corresponding video effect mode (ie, a video effect mode specified by the effect parameter) according to the effect parameter and the attribute of the material set. In some embodiments, step S402 performs a normalization process on the set of materials to cause each of the material elements to be converted into a predetermined format. The predetermined format includes, for example, an image encoding format, an image playback frame rate, an image size, and the like. In some embodiments, the predetermined format is associated with an effect parameter. In other words, each effect parameter is configured with a corresponding predetermined format. In this way, step S402 can determine a corresponding predetermined format according to the effect parameter, and perform normalization processing on the material element. Based on this, step S402 can synthesize the normalized material set into a video according to the effect parameter.

In some embodiments, the video composition application is configured with multiple video composition scripts. Here, each video composition script (which may also be referred to as a video composition template) corresponds to a video composition effect that can be executed by the video composition application. Based on the effect parameters, step S402 can determine a plurality of rendering stages corresponding to the effect parameters. Each rendering stage includes at least one of the plurality of video composition scripts described above, and the rendering result of each rendering stage is the input of the next rendering stage. In this way, step S402 can render the material elements in the material collection according to multiple rendering stages to synthesize the video. Here, through a plurality of rendering stages, step S402 can implement a superimposed composite effect (ie, a video effect mode corresponding to the effect parameter). FIG. 5 illustrates a video rendering process in accordance with some embodiments of the present application. The process shown in Figure 5 includes three rendering stages S1, S2, and S3. Stage S1 executes scripts X1 and X2. Here, the material set may include, for example, 20 material elements. In step S402, the first 10 material elements can be rendered by executing the script X1, and the last 10 material elements are rendered by the script X2. For the rendering structure of S1, step S402 can continue to perform the overlay effect processing by executing scripts X3 and X4 at stage S2. For the rendering result of the stage S2, the step S402 may continue the superimposition effect processing at the stage S3, thereby generating a rendering result corresponding to the effect parameter. Here, the format of each script is, for example, Extensible Markup Language (abbreviated as XML). Step S402 may, for example, invoke an After Effects (abbreviated as AE) application to execute the script, but is not limited thereto. The code example of the step S402 invoking the AE to perform the rendering operation is as follows:

"aerender-project test.aepx–comp "test"-RStemplate "test_1" –Omtemplate "test_2"-output test.mov

Among them, aerender indicates the name of the AE command line execution program.

Project test.aepx indicates that the current project template file is test.aepx.

Comp indicates that the synthesizer name used for this rendering is tes.

RStemplate indicates that the script name is test_1.

Omtemplate indicates that the video output template name is test_2.

Output indicates that the output video is named test.mov.

In summary, the video composition method 400 according to the present application can acquire a set of materials from the material processing application 111 and determine a plurality of rendering stages corresponding to the effect parameters. Based on this, the video composition method 400 can synthesize the rendering result with the superimposed video effect by performing a plurality of rendering stages. In particular, the video synthesis method 400 performs multiple stages of rendering through the material collection, and can generate various complex video effects, thereby greatly improving the efficiency of video synthesis and increasing the type of video synthesis effect.

FIG. 6 shows a flow diagram of a video composition method 600 in accordance with some embodiments of the present application. Video synthesis method 600 can be performed by a video synthesis application. For example, server 120 can include the video composition application.

As shown in FIG. 6, the video synthesis method 600 includes steps S601 to S602. The implementations of steps S601 to S602 are consistent with steps S401 to S402, respectively, and are not described herein again. In addition, the video synthesis method 600 further includes step S603.

In step S603, voice information corresponding to the text content is generated. Specifically, for text content in one material element, step S603 can be converted into voice information. Here, step S603 can perform voice conversion using various predetermined speech conversion algorithms. For example, step S603 can invoke the Xunfei speech synthesis component to obtain a corresponding audio file.

In step S604, caption information corresponding to the voice information is generated. Here, step S604 can adopt various techniques capable of generating subtitles, which is not limited in this application. For example, step S604 may call Fast Forward MPEG (abbreviated as FFMPEG) software for caption generation, but is not limited thereto. Here, the generated subtitle includes parameters such as a subtitle effect, a subtitle display time, and the like.

In step S605, the voice information and the subtitle information are added to the video synthesized in step S602.

FIG. 7 shows a schematic diagram of a processing device 700 for video material in accordance with some embodiments of the present application. The material processing application 111 may, for example, be in a processing device 700 that includes video material. As shown in FIG. 7, the processing device 700 of the video material includes a material acquisition unit 701, an effect determination unit 702, and a transmission unit 703. The material acquiring unit 701 can acquire a material set of the video to be synthesized, and determine an attribute of the material set. The material collection includes a plurality of material elements, each of which includes at least one of image content, text, audio, and video. The attribute includes a play order and a play duration of each material element in the material set. In some embodiments, the material acquisition unit 701 can provide a user interface for obtaining a material element. The user interface includes at least one control corresponding to at least one media type, respectively. The at least one media type includes at least one of text, picture, audio, and video. In response to operation of any of the controls in the user interface, the material acquisition unit 701 can acquire the media content corresponding to the media type of the control and use it as a piece of media content of a material element in the material collection. In some embodiments, the material acquisition unit 701 can obtain a picture and use it as the picture content of a material element of the material collection in response to an operation on a picture control in the user interface. In some embodiments, the material acquisition unit 701 may also acquire the input text information associated with the picture content as the text content of the material element in response to the operation of the text input control associated with the picture control. In some embodiments, the material acquisition unit 701 can also retrieve the input audio information associated with the picture content in response to an operation of the audio control associated with the picture control and use it as the audio content of the material element. In some embodiments, the material acquisition unit 701 can also retrieve a video segment as the video content of a material element of the material collection in response to an operation on a video control in the user interface.

In some embodiments, the material acquisition unit 701 first acquires a piece of video, and then extracts at least one video segment from the segment of video and generates descriptive information for each video segment according to a video editing algorithm.

Specifically, the material acquisition unit 701 can determine at least one key image frame of the video. For each key image frame, the material acquisition unit 701 can extract a video segment containing the key image frame from the video. The video clip includes an audio clip. The material acquisition unit 701 can also perform character recognition on the audio segment to acquire corresponding text, and generate description information corresponding to the video segment according to the text.

Based on this, the material acquisition unit 701 can provide a user interface for displaying description information of each video segment, so that the user performs segment selection according to the description information of each video segment. The material acquisition unit 701 respectively determines each of the selected video segments as the video content of one material element in the material set in response to the selection operation on the video segment.

In some embodiments, the material acquisition unit 701 can provide a user interface that presents thumbnails corresponding to respective material elements in the material collection. The thumbnails corresponding to the respective material elements are sequentially arranged in the corresponding display area of the user interface. The material acquisition unit 701 can adjust the arrangement order of the elements in the material set in response to the movement operation of the thumbnails in the user interface, and use the adjusted arrangement order as the playback order of the material collection. In some embodiments, when a material element includes picture content, the material obtaining unit 701 may use the playing time of the picture content as the playing duration of the material element. When a material element includes video content, the material acquisition unit 701 may use the playing time of the video content as the playing duration of the material element.

The effect determination unit 702 can determine an effect parameter corresponding to the material set. The effect parameter corresponds to a video effect mode. In some embodiments, the effect determination unit 702 can provide a user interface that includes a plurality of effect options. Each of these effect options corresponds to an effect parameter. In response to the preview operation on any of the plurality of effect options, the effect determination unit 702 displays the corresponding preview effect map in the user interface. In response to the selected operation of any one of the plurality of effect options, the effect determining unit 702 sets the effect parameter corresponding to the selected effect option as the effect parameter corresponding to the material set.

The transmitting unit 703 may transmit the material set and the effect parameter to the video composition server, so that the video composition server combines the plurality of material elements in the material set into a video corresponding to the video effect mode according to the effect parameter and the attribute of the material set. It should be noted that a more specific implementation of the apparatus 700 is consistent with the method 200, and details are not described herein again. In summary, according to the processing device 700 of the video material of the present application, content selection can be performed in a user interface (for example, the user interface of FIGS. 3A to 3G), so that the material set of the video to be synthesized can be conveniently obtained. In particular, the processing device 700 of the video material can also automatically clip the video to generate a video segment and corresponding description information, thereby enabling the user to quickly determine the content of the video segment and perform segment selection by viewing the description information. In addition, the processing device 700 of the video material can prevent the user from performing complicated operations related to the video effect on the local terminal device, and can intuitively present the preview effect map (eg, effect animation, etc.) of the plurality of video effect modes to the user. Thereby, it is convenient for the user to quickly determine the effect mode of the video to be synthesized. Based on this, the processing device 700 of the video material can synthesize the video through the video synthesis server, thereby greatly improving the user experience.

FIG. 8 shows a schematic diagram of a video synthesis device 800 in accordance with some embodiments of the present application. The video synthesis application can include a video synthesis device 800. Server 120 may, for example, include the video composition application.

As shown in FIG. 8, the video synthesizing apparatus 800 may include a communication unit 801 and a video synthesizing unit 802. The communication unit 801 can acquire a material set of a video to be synthesized and an effect parameter regarding the material set from the material processing application 111. The material collection includes a plurality of material elements, and each of the material elements includes at least one of image content, text, audio, and video. The properties of the clip collection include the play order and play duration of each clip in the clip. The effect parameter corresponds to a video effect mode.

The video synthesizing unit 802 can synthesize a plurality of material elements in the material set into a video of a video effect mode according to the effect parameter and the attribute of the material set. In some embodiments, video synthesizing unit 802 can normalize the set of material to cause each material element to be converted into a predetermined format. The predetermined format includes an image encoding format, an image playback frame rate, and an image size. According to the effect parameters, the video synthesizing unit 802 synthesizes the normalized processed material set into a corresponding video. In some embodiments, video synthesizing unit 802 can determine a plurality of rendering stages corresponding to the effect parameters based on a plurality of video synthesis scripts for execution in the predetermined video composition application. Each video composition script corresponds to a video composition effect, and each rendering stage includes at least one of the plurality of video composition scripts, and the rendering result of each rendering stage is the input content of the next rendering stage. Based on the plurality of rendering stages, the video composition unit 802 can render the set of materials to generate a corresponding video. The video effect mode may, for example, include a video transition mode between adjacent material elements. It should be noted that a more specific implementation of the video synthesizing apparatus 800 is consistent with the video synthesizing method 400, and details are not described herein again. In summary, the video synthesizing apparatus 800 according to the present application can acquire a material set from the material processing application 111 and determine a plurality of rendering stages corresponding to the effect parameters. Based on this, the video synthesizing device 800 can synthesize the rendering result with the superimposed video effect by performing a plurality of rendering stages. In particular, the video synthesizing device 800 performs multi-stage rendering through the material set, and can generate various complicated video effects, thereby greatly improving the efficiency of video synthesis and increasing the type of video synthesis effect.

FIG. 9 shows a schematic diagram of a video synthesizing device 900 in accordance with some embodiments of the present application. The video synthesis application can include a video synthesis device 900. Server 120 may, for example, include the video composition application.

As shown in FIG. 9, the video synthesizing apparatus 900 includes a communication unit 901 and a video synthesizing unit 902. Here, the communication unit 901 can be implemented as an embodiment consistent with the communication unit 801. The video synthesizing unit 902 can be implemented as an embodiment consistent with the video synthesizing unit 802, and details are not described herein again. In addition, the device 900 may further include a speech synthesis unit 903, a subtitle generation unit 904, and an addition unit 905.

When one of the material elements in the material set includes the picture content and the corresponding text content, the speech synthesis unit 903 can generate the speech information corresponding to the text content. The subtitle generating unit 904 can generate subtitle information corresponding to the voice information. Based on this, the adding unit 905 is for adding the voice information and the caption information to the generated video. It should be noted that a more specific implementation of the video synthesizing apparatus 900 is consistent with the video synthesizing method 600, and details are not described herein again.

Figure 10 shows a block diagram of the composition of a computing device. As shown in FIG. 10, the computing device includes one or more processors (CPU or GPU) 1002, a communication module 1004, a memory 1006, a user interface 1010, and a communication bus 1008 for interconnecting these components.

The processor 1002 can receive and transmit data through the communication module 1004 to effect network communication and/or local communication.

User interface 1010 includes one or more output devices 1012 that include one or more speakers and/or one or more visual displays. User interface 1010 also includes one or more input devices 1014 including, for example, a keyboard, a mouse, a voice command input unit or loudspeaker, a touch screen display, a touch sensitive tablet, a gesture capture camera or other input button or control, and the like.

The memory 1006 may be a high speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state storage device; or a nonvolatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, Or other non-volatile solid-state storage devices.

The memory 1006 stores a set of instructions executable by the processor 1002, including:

An operating system 1016, including a program for processing various basic system services and for performing hardware related tasks;

The application 1018 includes various programs for implementing the above method, and the program can implement the processing flow in each of the above examples. When the computing device of FIG. 10 is implemented as terminal device 110, application 1018 can include a video material processing application in accordance with the present application. The video material processing application may include the processing device 700 of the video material shown in FIG. Additionally, when the computing device of FIG. 10 is implemented as server 120, application 1018 can include a video composition application. The video composition application may include, for example, the video synthesizing device 800 shown in FIG. 8 or the video synthesizing device 900 shown in FIG.

Additionally, each of the examples of the present application can be implemented by a data processing program executed by a data processing device such as a computer. Obviously, the data processing program constitutes the present application. Further, a data processing program that is usually stored in one storage medium is executed by directly reading the program out of the storage medium or by installing or copying the program to a storage device (such as a hard disk or a memory) of the data processing device. Therefore, such a storage medium also constitutes the present invention. The storage medium can use any type of recording method, such as paper storage medium (such as paper tape, etc.), magnetic storage medium (such as floppy disk, hard disk, flash memory, etc.), optical storage medium (such as CD-ROM, etc.), magneto-optical storage medium ( Such as MO, etc.).

The present application therefore also discloses a non-volatile storage medium in which is stored a data processing program for performing any of the above-described methods of the present application.

In addition, the method steps described in this application can be implemented by a data processing program, and can also be implemented by hardware, for example, by logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers, and embedded control. And so on. Thus, such hardware that can implement the methods described herein can also form the present application.

The above description is only an illustrative example of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc., which are within the spirit and principles of the present application, should be included in the protection of the present application. Within the scope of.

Claims

A method for processing a video material is performed by a terminal device, and the method includes:

Obtaining a set of materials of the video to be synthesized, and determining attributes of the set of materials, wherein the set of materials includes a plurality of material elements, each material element including at least one of media content of a picture, a text, an audio, and a video, The attribute includes a play order and a play duration of each material element in the material set;

Determining an effect parameter corresponding to the set of materials, the effect parameter corresponding to a video effect mode;

Transmitting the material set and the effect parameter to a video composition server, so that the video composition server synthesizes a plurality of material elements in the material set to correspond to according to the effect parameter and an attribute of the material set The video of the video effect mode.
The method of claim 1, wherein the obtaining a material set of the video to be synthesized comprises:

Providing a user interface for acquiring a material element, the user interface including at least one control respectively corresponding to at least one media type, the at least one media type comprising: at least one of text, picture, audio, video;

In response to operation of any one of the user interfaces, the media content corresponding to the media type of the control is obtained and used as a media content of a material element in the material collection.
The method of claim 2, wherein the responsive to an operation of any one of the user interfaces, obtaining media content corresponding to a media type of the control and using it as a material in the material collection A media content of the element, including:

In response to an operation on a picture control in the user interface, a picture is taken and used as the picture content of a material element of the material collection.
The method of claim 2, wherein the responsive to an operation of any one of the user interfaces, obtaining media content of a media type corresponding to the control and using it as a material element in the material collection A media content, including:

In response to operation of a video control in the user interface, a video clip is retrieved and used as the video content of a material element of the material collection.
The method of claim 1, wherein the obtaining a material set of the video to be synthesized comprises:

Get a video;

Extracting at least one video segment from the segment of video and generating description information for each video segment according to a video editing algorithm;

Providing a user interface for displaying description information of each of the video segments, so that the user performs segment selection according to description information of each video segment;

Each of the selected video segments is respectively used as video content of one of the material elements in the material set in response to a selection operation of the at least one video segment.
The method of claim 5, wherein said extracting at least one video segment from the segment of video and generating description information for each video segment according to a video editing algorithm comprises:

Determining at least one key image frame of the video;

For each key image frame, extracting a video segment containing the key image frame from the video, the video segment including an audio segment;

Performing character recognition on the audio segment to obtain a corresponding text, and generating description information corresponding to the video segment according to the text.
The method of claim 1 wherein said determining an attribute of said set of materials comprises:

Providing a user interface for presenting a thumbnail corresponding to each material element in the material set, and thumbnails corresponding to the material elements are sequentially arranged in a corresponding display area of the user interface;

The arrangement order of each element in the material set is adjusted in response to a movement operation of the thumbnail in the user interface, and the adjusted arrangement order is used as a play order of the material set.
The method of claim 1, wherein the determining an effect parameter corresponding to the material set, the effect parameter corresponding to a video effect mode comprises:

Providing a user interface with multiple effect options, where each effect option corresponds to an effect parameter;

Displaying a corresponding preview effect map in the user interface in response to a preview operation on any of the plurality of effect options;

In response to the selected operation of any of the plurality of effect options, the effect parameter corresponding to the selected effect option is taken as an effect parameter corresponding to the set of materials.
The method of claim 1, wherein said transmitting said set of materials and said effect parameters to a video composition server, such that said video composition server is based on said effect parameters and attributes of said set of materials Synthesizing a plurality of material elements in the set of materials into a video corresponding to the video effect mode includes transmitting a video composition request to the video composition server, the video composition request including the material set and the effect parameter, so that The video composition server synthesizes a plurality of material elements in the material set into the video corresponding to the video effect mode in response to the video composition request.
A video synthesis method is performed by a server, the method comprising:

Obtaining a material set of a video to be synthesized and an effect parameter about the material set from the material processing application, wherein the material set includes a plurality of material elements, each material element including at least one of a picture, a text, an audio, and a video Content, the attribute of the material set includes a play order and a play duration of each material element in the material set, and the effect parameter corresponds to a video effect mode;

And synthesizing a plurality of material elements in the material set into a video of the video effect mode according to the effect parameter and an attribute of the material set.
The method of claim 10, wherein when one of the material elements in the material set includes picture content and corresponding text content, the method further comprises:

Generating voice information corresponding to the text content;

Generating caption information corresponding to the voice information;

The voice information and the caption information are added to the video.
The method of claim 10, wherein the synthesizing the plurality of material elements in the material set into the video of the video effect mode according to the effect parameter and the attribute of the material set comprises:

Determining a plurality of rendering stages corresponding to the effect parameter based on a plurality of video composition scripts for execution in a predetermined video composition application, wherein each video composition script corresponds to a video composition effect, each rendering stage Include at least one of the plurality of video composition scripts, and the rendering result of each rendering stage is an input content of a next rendering stage;

The set of materials is rendered based on the plurality of rendering stages to generate the video.
The method of claim 10, wherein the synthesizing the plurality of material elements in the material set into the video of the video effect mode according to the effect parameter and the attribute of the material set comprises:

Normalizing the set of materials to convert respective material elements into a predetermined format, the image format, an image playback frame rate, and an image size;

The material set that has undergone normalization processing is synthesized into the video according to the effect parameter.
A terminal device comprising: a processor and a memory; the memory storing computer readable instructions, the processor being operative to perform the method of any of claims 1-9.
A server comprising: a processor and a memory; the memory having computer readable instructions stored thereon, the processor being operative to perform the method of any of claims 10-13.
A non-volatile storage medium storing a data processing program, the data processing program, when executed by a computing device, causing the computing device to perform the method of any of claims 1-13.