CN116112760A

CN116112760A - Video synthesis method, device, electronic equipment and storage medium

Info

Publication number: CN116112760A
Application number: CN202211733268.7A
Authority: CN
Inventors: 罗超; 李志勇; 袁也
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-05-12

Abstract

The embodiment of the application provides a video synthesis method, a video synthesis device, electronic equipment and a storage medium. The video synthesis method comprises the following steps: acquiring a preset video template, and converting the video template into first data in a preset format; acquiring data to be replaced, and packaging the data to be replaced into second data in a preset format; combining the first data and the second data to obtain composite data in a preset format; a composite video is generated based on the composite data. In the embodiment of the application, the user provides the data to be replaced, the video template and the data to be replaced are subjected to conversion processing in the same format by the video synthesis software, so that the converted first data and second data can be combined into the synthesized data, and then the synthesized data is processed to obtain the synthesized video, so that the user only needs to execute simple operation, and the video synthesis software automatically performs video synthesis, thereby reducing the difficulty of video synthesis, simplifying the video synthesis process, improving the video synthesis efficiency and improving the user experience.

Description

Video synthesis method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video synthesis method, a video synthesis device, an electronic device, and a storage medium.

Background

The video synthesis technology refers to mixing multiple materials into a single composite picture. Early film and television synthesis techniques were mainly implemented in film, tape shooting and film finishing processes, but the process was well behind. Methods and means for synthesizing such as "buckling" and "overlapping" are widely used in early movie and television production. Compared with the traditional synthesis technology, the digital synthesis technology utilizes the advanced principle and method of computer imaging to collect various source materials into a computer and mix the materials into a single composite image by the computer.

In the prior art, video synthesis is generally performed in an offline manner, for example, a user collates materials and submits the materials to a video synthesis engineer, and the video synthesis engineer completes video synthesis offline by using video processing software and then provides the synthesized video to the user. However, in the above manner, video composition adopts offline production, which takes a long time, is complicated in flow, is inefficient, and requires professional video composition engineers to operate.

Disclosure of Invention

In view of the above problems, embodiments of the present application provide a video synthesis method, apparatus, electronic device, and storage medium, which are used for solving the problems that an offline video synthesis method takes a long time, has a complicated flow and low efficiency, and requires a professional video synthesis engineer to operate.

According to an aspect of embodiments of the present application, there is provided a video compositing method, the method comprising:

acquiring a preset video template, and converting the video template into first data in a preset format;

acquiring data to be replaced, and packaging the data to be replaced into second data in the preset format;

combining the first data and the second data to obtain the synthetic data in the preset format;

generating a composite video based on the composite data.

Optionally, the converting the video template into the first data in the preset format includes: and calling a first preset code through a first computer bottom instruction to sequence the video template, and converting the video template into first data in a JSON format.

Optionally, the encapsulating the data to be replaced into the second data in the preset format includes: and calling a second preset code through a second computer bottom instruction to sequence the data to be replaced, and converting the data to be replaced into second data in a JSON format.

Optionally, the merging the first data and the second data to obtain the composite data in the preset format includes: combining the first data and the second data to obtain the synthetic data in the preset format; storing the synthesized data to the local, and replacing a network path of the synthesized data with a local path; and adding a unique identifier to the static resource and the text layer of the first data, and storing the unique identifier to the local.

Optionally, the generating the composite video based on the composite data includes: calling a rendering command of an executable file of video composition software through a third computer bottom instruction, and rendering the composition data into video data in a first format; and calling a transcoding command of a video processing tool through a fourth computer bottom instruction, and transcoding the video data in the first format into video data in the second format to obtain the composite video.

According to another aspect of embodiments of the present application, there is provided a video compositing apparatus, the apparatus comprising:

the conversion module is used for acquiring a preset video template and converting the video template into first data in a preset format;

the packaging module is used for acquiring data to be replaced and packaging the data to be replaced into second data in the preset format;

The merging module is used for merging the first data and the second data to obtain the synthetic data in the preset format;

and the generation module is used for generating a synthesized video based on the synthesized data.

Optionally, the conversion module is specifically configured to call a first preset code through a first computer bottom instruction to sequence the video template, and convert the video template into first data in JSON format.

Optionally, the packaging module is specifically configured to call a second preset code through a second computer bottom instruction to sequence the data to be replaced and convert the sequence into second data in JSON format.

Optionally, the combining module includes: the data merging unit is used for merging the first data and the second data to obtain the synthetic data in the preset format; a path replacing unit, configured to store the synthesized data to a local area, and replace a network path of the synthesized data with the local path; and the identifier adding unit is used for adding the unique identifier to the static resource and the text layer of the first data and storing the unique identifier to the local.

Optionally, the generating module includes: the data rendering unit is used for calling a rendering command of an executable file of the video composition software through a third computer bottom instruction and rendering the composition data into video data in a first format; the video transcoding unit is used for calling a transcoding command of the video processing tool through a fourth computer bottom layer instruction, and transcoding the video data in the first format into the video data in the second format to obtain the composite video.

According to another aspect of embodiments of the present application, there is provided an electronic device including: one or more processors; and one or more computer-readable storage media having instructions stored thereon; the instructions, when executed by the one or more processors, cause the processor to perform the video compositing method of any of the above.

According to another aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the video compositing method according to any of the above.

In the embodiment of the application, a preset video template is obtained, and the video template is converted into first data in a preset format; acquiring data to be replaced, and packaging the data to be replaced into second data in the preset format; combining the first data and the second data to obtain the synthetic data in the preset format; generating a composite video based on the composite data. Therefore, in the embodiment of the application, when video synthesis is needed, the user provides the data to be replaced, the video synthesis software performs conversion processing of the same format on the video template and the data to be replaced, so that the converted first data and second data can be combined into synthesized data, and then the synthesized data is processed to obtain the synthesized video.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some drawings of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of steps of a video compositing method according to an embodiment of the application.

Fig. 2 is a flow chart of a video synthesizing method according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a filler format according to an embodiment of the present application.

Fig. 4 is a block diagram of a video synthesizing apparatus according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some of the embodiments of the present application, not all the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Embodiments herein may be applied to any suitable video composition software. Illustratively, the video composition software may include, but is not limited to: AE (Adobe After Effects) software, adobe premier software, maya software, nuke software, EDIUS software (nonlinear editing software), and the like.

Taking AE software as an example, AE is an indispensable auxiliary tool for producing dynamic images, and is professional nonlinear editing software for video post-synthesis processing. AE are widely used, covering movies, advertisements, multimedia, web pages, games, etc.

AE provides a frame-based video design approach. It is a procedure that enables high quality subpixel localization, by which highly smooth motion can be achieved. AE provides many valuable functions to multimedia producers, including excellent blue screen fusion, special effect creation, cinpak compression, etc.

AE supports an infinite number of layers, and can directly import illustrator (Adobe illustrator is software applied to industry standard vector picture inserting of publishing, multimedia and online images, adobe illustrator is widely applied to printing and publishing, newspaper book typesetting, professional picture inserting, multimedia image processing, manufacturing of internet pages and the like, can also provide higher precision and control for line manuscripts, is suitable for producing any small-sized and large-sized complex project) and Photoshop (PS for short), is image processing software developed and issued by Adobe Systems, mainly processes digital images formed by pixels, and can effectively carry out picture editing work by using a plurality of editing and drawing tools. AE also has a variety of plug-ins, including Meta Tool Final Effect, which can provide virtual moving images and various types of particle systems with which unique fantasy effects can be created.

The AE is used for professional special effect synthesis software of a high-end video special effect system, and the AE uses the success of a plurality of excellent software to raise video special effect synthesis to a new height: the introduction of the Photoshop middle layer enables AE to control the multi-layer synthetic image, and the seamless synthetic effect of the costume is produced; key frames and paths are introduced, so that the control of the advanced two-dimensional animation is free; the high-efficiency video processing system ensures the output of high-quality video; the dazzling trick system enables the AE to achieve a user's creative.

AE also retains Adobe's excellent software compatibility. The method can be very conveniently called into a layer file of Photoshop, illustrator; the project file of Premiere may also be reproduced in AE; even the EDL (Editorial Determine List, clip decision list) file of Premiere can be called in. AE also enables flexible mixing of two and three dimensions in one synthesis. The user may work in two or three dimensions or mix together and match on a layer basis. One layer can be converted into three-dimensional at any time by using three-dimensional once-switching; both the two-dimensional and three-dimensional layers can be moved horizontally or vertically; the three-dimensional layer may be animated in three dimensions while maintaining interactive effects with lights, shadows, and cameras. And AE supports most audio, video, graphics formats, and even can change the file call for recording the three-dimensional channel.

Taking Adobe premier software as an example, adobe premier is video editing software, is software with better editing picture quality, has better compatibility, and can cooperate with other software which is proposed by Adobe company. Adobe Premiere software can be widely applied to film and television editing such as: advertisement production, and television program production.

Adobe Premiere software is nonlinear video editing software and is used for combining and splicing video paragraphs, providing a certain special effect and a color matching function and realizing professional color matching through a third-party plug-in. The method can improve the authoring capability and the authoring freedom degree of the user, and is easy-to-learn, efficient and accurate video editing software. Adobe premier provides a set of processes of collecting, editing, toning, beautifying audio, adding and outputting subtitles and recording DVD, and is integrated with other Adobe software with high efficiency, thereby meeting the requirement of creating high-quality works.

Taking Maya software as an example, maya software is three-dimensional modeling and animation software. Maya can greatly improve workflow efficiency of development, design, creation in the fields of movies, televisions, games, etc., and simultaneously improves polygon modeling, performance is improved through a new algorithm, multi-thread support can fully utilize the advantages of a multi-core processor, and new HLSL (High Level Shader Language, high-order shader language) shading tools and hardware shading APIs (Application Program Interface, application program interfaces) can greatly enhance the appearance of new generation host games, and in addition, the multi-thread support is more flexible in character establishment and animation.

The function of the viewport window is greatly improved in Maya software, and the strong point is that the display of motion blur is directly supported. Depth-of-field (depth-of-field) and environmental exclusion (occlusion) effects may also be displayed directly in the window. The new Motion trail (Motion Trails) editing function can eliminate the need for the user to open the animation path of the graphic editor.

Taking Nuke software as an example, nuke is a post-synthesis software, and is known as visual effect software. Nuke does not require a specialized hardware platform, but rather provides a flexible, efficient, economical, and fully functional tool for combining and manipulating scanned photographs, video boards, and computer-generated images. Nuke has advanced capabilities to seamlessly integrate the final visual effect with the rest of the motion picture television, no matter what style or complexity of the visual effect the application is required to apply.

Nuke is mainly used as film and television post-composition software, unlike AE, which is typically film and television special effects, animation, video post-composition software, and Nuke is a more specialized film-level post-composition software. The light arrangement image processing has higher flexibility and higher efficiency, and is a versatile tool. Nuke is digital node type synthesis software, is node type operation, and AE belongs to picture layer type operation. The layers belong to the layer classification, one layer controlling a part of the operations. The nodes are independent units for establishing connection and transmitting information, one node has certain attribute, and each different node manages different effects and is connected from top to bottom to present the effects.

Taking EDUS software as an example, the EDUS nonlinear editing software is specially designed for broadcasting and post-production environments, and is especially aimed at news remembers and non-banded video production and storage. EDIUS has a perfect workflow based on files and provides real-time, multi-track, multi-format mixed editing, synthesizing, color key, caption and time line output functions. In addition to the standard EDIUS series format, video materials such as DVCPRO (digital component standard format), P2 format, MXF (Material eXchange Format ), XDCAM format, XDCAM EX format, and the like are supported. While supporting DV (Digital Video), HDV (High Definition Digital Video ), video recorders, etc.

In order to solve the above problems, the embodiments of the present application provide a video synthesis method supporting a multilingual algorithm, which implements the operation of video synthesis software through a computer programming language, and a video producer can quickly synthesize a desired synthesized video on line by only dragging a picture on a system, editing advertisement text, configuring background music, and the like. The common user can complete the work which can be completed by professional video composition engineers through offline modes through simple steps such as online dragging, the efficiency of video composition is improved, the difficulty of video composition is reduced, and the user experience is improved.

Next, the video compositing method of the present application will be described by the following examples.

Referring to fig. 1, a flowchart of steps of a video compositing method according to an embodiment of the application is shown.

As shown in fig. 1, the video composition method may include the steps of:

step 101, obtaining a video template, and converting the video template into first data in a preset format.

In this embodiment of the present application, the video template refers to a video template provided by video composition software, where the video template may be made offline by a video composition engineer, and one or more video templates may be used. In the embodiment of the application, the video synthesis software replaces the data to be replaced on the video templates based on the video templates, so as to obtain the synthesized video.

The video templates may include data files, static resource files, text layers, and so forth. The data file may include information such as components and configuration in the video template, the static resource file may include material resources such as pictures, audio and video in the video template, and the text layer may include information such as characters in the video template.

In the embodiment of the application, the video templates provided by the video synthesis software can be video templates with specific extensions, all the video templates can be integrated into a certain library, and the video templates with the specific extensions in the library can be called through a bottom instruction of a computer. For example, in the embodiment of the present application, one or more video templates selected by a user may be obtained based on an identifier of a video template to be invoked included in the user operation information; default one or more video templates may also be obtained; video templates may also be acquired in other forms.

In this embodiment of the present application, after a video template is acquired, the video template may be converted into first data in a preset format, so as to be combined with data to be replaced provided by a user.

In an alternative embodiment, the process of converting the video template into the first data in the preset format may include: and calling a first preset code through a first computer bottom instruction to sequence the video template, and converting the video template into first data in a JSON (JavaScript Object Notation ) format.

The first computer bottom instruction refers to an instruction used when a first preset code is called, and can be specifically selected according to actual conditions. The first preset code refers to a code having a function of converting a format of data, for example, the first preset code may be a Document code or the like. And calling a first preset code through a first computer bottom instruction, serializing the video template, and converting the video template into first data in a JSON format.

JSON is a lightweight data exchange format that is easy for a person to read and write, and at the same time easy for a machine to parse and generate. JSON adopts a language independent text format, but also uses habits similar to the C language family (including C, c++, c#, java, javaScript, perl, python, etc.). These characteristics make JSON an ideal data exchange language.

JSON is built in two structures:

a set of "name/value" pairs. In a different language, it is understood as an object (object), a record (structure), a dictionary (dictionary), a hash table (hash table), a keyed list (key list), or an associative array (associative array).

An ordered list of values. In most languages, it is understood as an array (array).

JSON has the following forms:

an object is an unordered set of "" name/value' pairs. An object starts with { left brackets } right brackets. Each "name" is followed by one: a colon; "name/value' pair" is used, comma separated.

An array is an ordered set of values (value). An array starts with a left middle bracket and ends with a right middle bracket. Values are used, comma separated.

The value may be a string (string), a number (number), true, false, null, an object (object), or an array (array) with double quotation marks. These structures may be nested.

Strings are a collection of any number of Unicode characters surrounded by double quotes, using reverse diagonal escape. A character (character) also represents a separate string (character string).

In this embodiment of the present application, the video template may be specifically converted into a string in JSON format, which is used as the first data.

In an embodiment of the present application, specifically, target data corresponding to data to be replaced in a video template may be obtained, and then the target data in the video template is converted into a preset format. By way of example, the target data may be a static resource (picture, audio, video, etc.), a text layer, etc. in a video template. For example, if the data to be replaced contains a picture, the target data contains a picture in the video template; the data to be replaced comprises audio, and the target data comprises audio in the video template; the data to be replaced comprises video, and the target data comprises video in a video template; the data to be replaced contains text, the target data contains a text layer in the video template, and so on.

Therefore, in step 101, specifically, target data corresponding to data to be replaced in a video template is obtained, and a first preset code is called by a first computer bottom instruction to serialize the target data in the video template and convert the serialized target data into first data in JSON format.

Step 102, obtaining data to be replaced, and packaging the data to be replaced into second data in the preset format.

In this embodiment of the present application, the data to be replaced may refer to data that is provided by a user and is wanted to be added to a video template for video synthesis.

Illustratively, the data to be replaced may include, but is not limited to, at least one of: pictures, audio, video, text, etc.

In this embodiment of the present application, after obtaining the data to be replaced, the data to be replaced may be converted into second data in a preset format, so as to be combined with the video template later.

In an alternative embodiment, the process of packaging the data to be replaced into the second data in the preset format may include: and calling a second preset code through a second computer bottom instruction to sequence the data to be replaced, and converting the data to be replaced into second data in a JSON format.

The second computer bottom instruction refers to an instruction used when a second preset code is called, and can be specifically selected according to actual conditions. The second preset code refers to a code having a function of converting a format of the data, and the second preset code may be the same as the first preset code, for example, the second preset code may be a Document code. And calling a second preset code through a second computer bottom instruction, serializing the data to be replaced, and converting the data to be replaced into second data in a JSON format.

In this embodiment of the present application, the data to be replaced may be specifically converted into a string in JSON format, which is used as the second data.

And 103, merging the first data and the second data to obtain the composite data in the preset format.

The first data refers to the data in the preset format obtained by converting the video template, the second data refers to the data in the preset format obtained by converting the data to be replaced, and the second data can be used for replacing the related first data (specifically, the target data corresponding to the data to be replaced in the video template can be replaced) by combining the first data and the second data, so that the synthesis of the data to be replaced and the video template is realized, and the synthesized data in the preset format is obtained.

In an alternative embodiment, the process of merging the first data and the second data to obtain the composite data in the preset format may include: combining the first data and the second data to obtain the synthetic data in the preset format; storing the synthesized data to the local, and replacing a network path of the synthesized data with a local path; and adding a unique identifier to the static resource and the text layer of the first data, and storing the unique identifier to the local.

The network path of the synthesized data may refer to a network path of a corresponding video template, and the local path of the synthesized data may be set according to actual situations.

Step 104, generating a composite video based on the composite data.

The synthesized data comprises related data and data to be replaced of the video template, and corresponding data in the video template is replaced by the data to be replaced, so that the synthesized video can be generated based on the synthesized data.

In an alternative embodiment, the process of generating the composite video based on the composite data may include the following steps A1 to A2:

and step A1, calling a rendering command of an executable file of video composition software through a third computer bottom instruction, and rendering the composition data into video data in a first format.

In this embodiment of the present application, the third computer bottom instruction refers to an instruction for calling a rendering command of an executable file of the video composition software, and may be specifically selected according to an actual situation. The first format may be any suitable video file format, such as AVI (Audio Video Interleaved, audio video staggering format), which is used as an intermediate video file.

And step A2, calling a transcoding command of a video processing tool through a fourth computer bottom layer instruction, and transcoding the video data in the first format into the video data in the second format to obtain the composite video.

In this embodiment of the present application, the fourth computer bottom instruction refers to an instruction for calling a transcoding command of the video processing tool, and may specifically be selected according to actual situations. The second format may be any suitable video file format such as MP 4.

In the embodiment of the application, when video synthesis is needed, the user provides the data to be replaced, the video synthesis software performs conversion processing on the video template and the data to be replaced in the same format, so that the converted first data and second data can be combined into the synthesized data, and then the synthesized data is processed to obtain the synthesized video.

In the following, a video composition method will be described by taking video composition software as AE software as an example.

Referring to fig. 2, a flow diagram of a video compositing method according to an embodiment of the application is shown.

As shown in fig. 2, the video composition method may include:

(1) And calling an aepx original file of the AE software through a bottom instruction of the computer to obtain xxx.

First, AE software is installed on a terminal system. By way of example, the terminal may be any suitable form of terminal, such as a desktop computer, a notebook computer, a tablet computer, a mobile phone, etc. The terminal operating system may be any suitable form of system, such as a Windows system, an android system, an iOS system, etc.

Professional AE software development engineers make various rich video templates off-line, usually ending with aepx, so the aepx raw file described above, xxx. The subsequent AE video synthesis is based on the video template, and the replacement of pictures, texts, audios, videos and the like is carried out.

The computer bottom layer instruction may be in the form of (1010110011) or the like, and may be triggered by JAVA program code by means of ProcessBuilder or the like.

(2) The AE video template is converted into a string (i.e., first data) in a specific format (e.g., JSON format) using computer-based instructions (computer-based algorithm language).

The character string in the specific format converted from the text is a template for replacing pictures, audio, video, text and the like.

Specifically, the aepx video templates in the original xml format may be serialized by Document code, each video template requiring this.

(3) And packaging the data to be replaced into the character string (namely the second data) with the specific format. This step is not shown in fig. 2.

The character string in the specific format obtained by encapsulation is the data to be replaced for replacing the related data in the AE video template.

(4) And merging the character strings converted from the data to be replaced and the character strings converted from the video template (such as merging through a computer bottom layer replacement algorithm) to obtain the merged character strings (namely the synthesized data).

With respect to the replacement of static resources, the content of the replacement contains three parts, of which the hexadecimal representation of the eight characters, JPEG, contains special characters, and two places need to be modified at the same time. If the picture is replaced by a picture in other formats, the picture is also modified into hexadecimal representation in corresponding formats.

Illustratively, the following is an example of a file format comparison:

‘506e6721’->PNG format

‘4a504547’->JPEG or JPG format

‘5449465f’->TIF or TIFF format

‘424d5020’->BMP format

‘4d504547’->MP4 format

‘41564956’->AVI format

‘4d4f6f56’->MOV format

‘4d503341’->MP3 format

‘57415645’->WAV format

the partial codes for the merging are as follows:

merging/merging JSON and template JSON data after user modification, downloading a file to the local, and replacing a network path in the merged JSON with a local path

List<Map<String,Object>>mergeList＝FileCtrl.mergeJSON(

JSONArray.parseArray(templateJSON),JSONObject.parseObject(userJSO N))；

Network path for setting AEPX

domXmlUp.setUrl("https://cdn.seamo.cn/cdnres/demo.aepx")；

The// input merges the replaced mergeList and the saved location, returns File File updateAEPX =domxmlnup._sers\ 56307 \desktop\demo. Aepx ");

the static resource of the AEPX file and the text layer are added with a unique identification ID, a 32-bit UUID and saved locally, and FileFile updateAEPX =domxmlnup.formatdocument ("C:_sers\ 56307 \desktop\demo. AEPX");

the execution of the EXE CMD can be a character string (EXE storage path) or an array, and when the EXE is called, the parameter can be transferred to the array to call (the parameter has sequential requirement)

ProcessBuilder builder＝new ProcessBuilder()；

builder.command(commend)；

builder.redirectErrorStream(true)；

System.out.printin ("executing command:" +command.tostring (). Reprofilall (","). Reprofilall ("]", "). Substring (1));

Process p＝builder.start()；

doWaitFor(p)；

p.destroy()；

System.out.println(">>>>>>>"+oPath)；

file＝new File(oPath)；

(5) And calling an AE's aeronder.exe related command to render the combined character strings to obtain a rendered video file.

The computer bottom layer instruction renders an engineering file AEP or an XML file aepx of AE into a lossless AVI format video by calling an aeronder.exe related command of an AE program.

During the rendering process, a filler is needed for video processing.

Referring to fig. 3, a schematic diagram of a filler format according to an embodiment of the present application is shown. As shown in fig. 3, the inner rectangular box represents the original video, the outer rectangular box represents the filled video, color represents the color of the filled background, w represents the width of the input video, and h represents the height of the input video.

(6) And (3) completing transcoding and compression of the video to the rendered video file through the FFmpeg related command to obtain an MP4 format video file (namely, a synthesized video) with the encoding format of H264.

In transcoding with FFmpeg, a video insertion process is involved. The following briefly introduces the video interpolation problem, with the input being two consecutive frames and a temporal, output of the predicted intermediate frame. One of the key operations is warping, which is the actual estimation of the frame from the optical flow, such as is known now and can be estimated by warping. warp is in fact every pixel value obtained by bilinear interpolation.

The problems to be solved and the corresponding solutions proposed are as follows:

problem 1: the conventional method only can realize the inter frame estimation at t=1/2, if more frames are to be inserted between two frames, multiple predictions are needed, for example, the inter frame at t=1/4 needs to be estimated, the inter frame at t=1/2 needs to be estimated first, and then the estimation of t=0 and t=1/2 is performed, so that time is consumed, and the method also has no method for estimating the inter frame at t=1/3. It is therefore desirable to be able to incorporate a time variable t to control the generation process of the intermediate frame.

First, the need to find a suitable dataset, which was previously used to train video plug-in frames, is often around 30fps, and too low a frame rate actually results in non-uniform inter-frame motion, which is sufficient for model training to estimate only 1/2. However, if it is desired to estimate the intermediate frames at a given time, it is necessary to ensure that the frame rate of the video is high enough to be a training data set, e.g., 240fps, and thus can be considered as a uniform motion.

Second, how to introduce time variations into the generation of intermediate frames. To estimate the intermediate frame at time t, the sum of the optical flows is estimated, and cannot be estimated directly because the time is variable. FFmpeg proposes a linear estimation method by estimating the sum of the bi-directional optical flow of two input frames and combining the time t. The formula for calculation is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing the first light flow, F _0→1 And F _1→0 Representing the bidirectional optical flow of two frames, t representing time.

The formula for the final composite intermediate frame is as follows:

an intermediate frame at time t is represented, Z represents a normalization factor, V _t←0 And V _t←1 Represents mask weight, g represents weighting factor, I ₀ And I ₁ Two consecutive frames are represented, and is a logical operator, which represents a exclusive nor operation, abbreviated as exclusive true or false, with the same being 1 and vice versa being zero.

It can be seen that in practice this is a weighted sum of the calculated sums, V _t←0 And V _t←1 The mask weights are expressed here taking into account that there is an occlusion in the object motion (i.e. some pixels of the intermediate frame can only be found in the first frame image and some in the second frame).

The relevant pseudocode for FFmpeg is expressed as follows:

ffmpeg-i a.mkv-r 1-s 256x256-vf pad＝"iw:iw:0:(iw/2-ih/2):black"％04d.jpg

each frame of a.mkv was converted to a square image with the name%04 d.png. -r 1 frame rate, representing one frame per second, outputting 5 pictures if the video length is 5s, and outputting 10 pictures if the video length is 10 s.

S 256x256 represents the output image size is 256x256.

ffmpeg-i a.mkv-vf pad＝"iw:iw:0:(iw/2-ih/2):black"％04d.png

In the embodiment of the application, the video can be quickly synthesized on line based on the bottom instruction of the computer, the method is suitable for various operating systems, the on-line synthesis difficulty of the AE video can be reduced, and the synthesis efficiency of the AE video is improved.

Referring to fig. 4, a block diagram of a video compositing apparatus according to an embodiment of the application is shown.

As shown in fig. 4, the video compositing apparatus may include the following modules:

the conversion module 401 is configured to obtain a preset video template, and convert the video template into first data in a preset format;

The packaging module 402 is configured to obtain data to be replaced, and package the data to be replaced into second data in the preset format;

a merging module 403, configured to merge the first data and the second data to obtain the composite data in the preset format;

a generating module 404, configured to generate a composite video based on the composite data.

Optionally, the conversion module 401 is specifically configured to call a first preset code through a first computer bottom instruction to sequence the video template and convert the video template into first data in JSON format.

Optionally, the encapsulation module 402 is specifically configured to call a second preset code through a second computer bottom instruction to serialize the data to be replaced and convert the serialized data into second data in JSON format.

Optionally, the merging module 403 includes: the data merging unit is used for merging the first data and the second data to obtain the synthetic data in the preset format; a path replacing unit, configured to store the synthesized data to a local area, and replace a network path of the synthesized data with the local path; and the identifier adding unit is used for adding the unique identifier to the static resource and the text layer of the first data and storing the unique identifier to the local.

Optionally, the generating module 404 includes: the data rendering unit is used for calling a rendering command of an executable file of the video composition software through a third computer bottom instruction and rendering the composition data into video data in a first format; the video transcoding unit is used for calling a transcoding command of the video processing tool through a fourth computer bottom layer instruction, and transcoding the video data in the first format into the video data in the second format to obtain the composite video.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In an embodiment of the present application, an electronic device is also provided. The electronic device may include one or more processors and one or more computer-readable storage media having instructions stored thereon, such as an application program. The instructions, when executed by the one or more processors, cause the processors to perform the video compositing method of any of the embodiments described above.

Referring to fig. 5, a schematic diagram of an electronic device structure according to an embodiment of the present application is shown. As shown in fig. 5, the electronic device comprises a processor 501, a communication interface 502, a memory 503, and a communication bus 504. The processor 501, the communication interface 502 and the memory 503 perform communication with each other through the communication bus 504.

A memory 503 for storing a computer program.

The processor 501 is configured to implement the video composition method according to any one of the above embodiments when executing the program stored in the memory 503.

The communication interface 502 is used for communication between the electronic device and other devices described above.

The communication bus 504 mentioned above may be a peripheral component interconnect standard (PeripheralComponent Interconnect, PCI) bus or an extended industry standard architecture (ExtendedIndustry Standard Architecture, EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The above-mentioned processor 501 may include, but is not limited to: central processing units (CentralProcessing Unit, CPU), network processors (Network Processor, NP), digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.

The above mentioned memory 503 may include, but is not limited to: read Only Memory (ROM), random access Memory (Random Access Memory RAM), compact disk Read Only Memory (Compact Disc Read Only Memory CD-ROM), electrically erasable programmable Read Only Memory (Electronic Erasable Programmable Read Only Memory EEPROM), hard disk, floppy disk, flash Memory, and the like.

In an embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon a computer program executable by a processor of an electronic device, the computer program, when executed by the processor, causing the processor to perform the video compositing method as described in any of the embodiments above.

In this specification, various embodiments are interrelated, and each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that identical and similar parts between the various embodiments are referred to each other.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM, RAM, magnetic disk, optical disk) and including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. In view of the foregoing, this description should not be construed as limiting the application.

Claims

1. A method of video synthesis, the method comprising:

acquiring a video template, and converting the video template into first data in a preset format;

generating a composite video based on the composite data.

2. The method of claim 1, wherein converting the video template into the first data in the preset format comprises:

and calling a first preset code through a first computer bottom instruction to sequence the video template, and converting the video template into first data in a JSON format.

3. The method of claim 1, wherein the encapsulating the data to be replaced into the second data in the preset format comprises:

And calling a second preset code through a second computer bottom instruction to sequence the data to be replaced, and converting the data to be replaced into second data in a JSON format.

4. The method of claim 1, wherein the merging the first data and the second data to obtain the composite data in the preset format includes:

storing the synthesized data to the local, and replacing a network path of the synthesized data with a local path;

and adding a unique identifier to the static resource and the text layer of the first data, and storing the unique identifier to the local.

5. The method of claim 1, wherein the generating a composite video based on the composite data comprises:

calling a rendering command of an executable file of video composition software through a third computer bottom instruction, and rendering the composition data into video data in a first format;

and calling a transcoding command of a video processing tool through a fourth computer bottom instruction, and transcoding the video data in the first format into video data in the second format to obtain the composite video.

6. A video compositing apparatus, the apparatus comprising:

7. The apparatus of claim 6, wherein the conversion module is specifically configured to call a first preset code through a first computer-based instruction to serialize the video template and convert the serialized video template into the first data in JSON format.

8. The apparatus of claim 6, wherein the encapsulation module is specifically configured to call a second preset code through a second computer bottom instruction to serialize the data to be replaced and convert the serialized data into second data in JSON format.

9. An electronic device, comprising:

one or more processors; and

one or more computer-readable storage media having instructions stored thereon;

The instructions, when executed by the one or more processors, cause the processor to perform the video compositing method of any of claims 1-5.

10. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to perform the video compositing method of any of claims 1-5.