CN115250335A

CN115250335A - Video processing method, device, equipment and storage medium

Info

Publication number: CN115250335A
Application number: CN202110470170.6A
Authority: CN
Inventors: 齐国鹏; 陈仁健
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2022-10-28

Abstract

The application provides a video processing method, a video processing device, electronic equipment and a computer readable storage medium; the method comprises the following steps: acquiring a video file to be edited and at least one special effect file for adding a special effect to the video file to be edited; performing tree editing processing based on a video file to be edited and at least one special effect file to obtain a tree rendering structure comprising the video file and the special effect file; rendering the tree-shaped rendering structure based on the video file to be edited and the video layer included by at least one special effect file to obtain a rendered video file; and performing audio synthesis processing based on the rendered video file to obtain a target video file with special effects. Through the method and the device, the flexibility and the efficiency of special effect addition can be improved.

Description

Video processing method, device, equipment and storage medium

Technical Field

The present application relates to computer application technologies, and in particular, to a video processing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of computer technology, computer animation is widely applied in various aspects such as game making, cartoon making, animation special effects and the like. For example, after a user shoots a video, processing equipment corresponding to the video social platform analyzes each video frame, the user can specify the video frame to add the special effect, and render the video frame after adding the special effect to obtain the video with the special effect.

However, the video social platform in the related art relies on a platform interface, which requires a large number of manual operations by a user, increasing the complexity of animation special effect addition.

Disclosure of Invention

The embodiment of the application provides a video processing method and device, electronic equipment and a computer readable storage medium, which can improve flexibility and efficiency of special effect addition.

The technical scheme of the embodiment of the application is realized as follows:

an embodiment of the present application provides a video processing method, including:

acquiring a video file to be edited and at least one special effect file for adding a special effect to the video file to be edited;

performing tree editing processing based on the video file to be edited and the at least one special effect file to obtain a tree rendering structure comprising the video file and the special effect file;

rendering the tree-shaped rendering structure based on the video file to be edited and the video layer included by the at least one special effect file to obtain a rendered video file;

and performing audio synthesis processing based on the rendered video file to obtain a target video file with special effects.

In the above technical solution, the performing audio synthesis processing based on the rendered video file to obtain a target video file with special effects includes:

carrying out segmentation processing on the audio file corresponding to the special effect file and the audio file corresponding to the video file to be edited to obtain a plurality of audio clips;

dividing the plurality of audio clips into corresponding audio tracks to obtain a plurality of audio tracks added with the audio clips, wherein the audio clips among the audio tracks are not overlapped;

and merging the plurality of audio tracks added with the audio clips to obtain a target video file added with special effects.

In the above technical solution, before dividing the plurality of audio clips into corresponding audio tracks, the method further includes:

decoding the plurality of audio clips through a depacketizer to obtain decoded audio clips;

resampling the decoded audio segment to obtain the resampled audio segment;

and carrying out audio adjustment processing on the re-sampled audio segments to obtain the audio segments for dividing processing.

In the above technical solution, before the obtaining of the video file to be edited and the at least one special effect file for adding a special effect to the video file to be edited, the method further includes:

displaying a plurality of candidate video files in a human-computer interaction interface;

in response to the selection operation for the candidate video file, taking the selected part of the candidate video file as the video file to be edited;

and acquiring at least one special effect file for adding a special effect to the video file to be edited.

In the above technical solution, the obtaining at least one special effect file for adding a special effect to the video file to be edited includes:

displaying a plurality of candidate special effect files in the human-computer interaction interface;

and in response to the selection operation of the candidate special effect file, taking the selected part of the candidate special effect file as the at least one special effect file for adding the special effect to the video file to be edited.

In the above technical solution, before displaying a plurality of candidate special effect files in the human-computer interaction interface, the method further includes:

and matching the special effect files in the special effect file library based on the video content of the video file to be edited, and taking the matched special effect files as the candidate special effect files.

acquiring the frequency of selecting special effect files in a special effect file library;

and when the selected frequency is greater than the selected threshold value, taking the special effect file as the candidate special effect file.

An embodiment of the present application provides a video processing apparatus, including:

the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a video file to be edited and at least one special effect file for adding a special effect to the video file to be edited;

the editing module is used for performing tree editing processing on the basis of the video file to be edited and the at least one special effect file to obtain a tree rendering structure comprising the video file and the special effect file;

the rendering module is used for rendering the tree-shaped rendering structure based on the video file to be edited and the video layers included by the at least one special effect file to obtain a rendered video file;

and the synthesis module is used for carrying out audio synthesis processing on the basis of the rendered video file to obtain a target video file added with an animation special effect.

In the above technical solution, the editing module is further configured to divide the video file to be edited based on the at least one special-effect file to obtain a plurality of video segments;

and combining the plurality of video clips and the at least one special effect file to obtain a tree-shaped rendering structure comprising the video files and the special effect files.

In the above technical solution, the editing module is further configured to obtain time information of each special effect file, where the time information includes a start time and a duration;

and determining a video segment corresponding to each special effect file and a video segment corresponding to video filling in the video file to be edited based on the starting time and duration of each special effect file, wherein the video filling indicates that the special effect is not added in the video file to be edited.

In the above technical solution, the editing module is further configured to execute the following processing for any one of the at least one special effect file:

determining the starting time of any special effect file as the starting time of a video clip corresponding to any special effect file in the video files to be edited;

determining the duration of any special effect file as the duration of a video clip corresponding to any special effect file in the video files to be edited;

determining the end time of any special effect file based on the start time and the duration of any special effect file, and taking the end time of any special effect file as the start duration of a video clip filled with a corresponding video in the video file to be edited;

and taking the time interval between the ending time of any one special effect file and the starting time of the adjacent special effect file as the duration of the video segment filled with the corresponding video in the video file to be edited.

In the above technical solution, the editing module is further configured to determine a rendering size of the video file to be edited based on a leader file and a trailer file in the special effect file;

determining a rendering root node based on the rendering size of the video file to be edited;

taking each video clip and each special effect file as rendering leaf nodes;

determining a tree-like rendering structure including the video file and the special effect file based on the rendering root node and the rendering leaf node.

In the above technical solution, the editing module is further configured to create an added layer interface based on the rendering root node;

and sequentially adding the rendering leaf nodes on the rendering root node according to the time information of the rendering leaf nodes through the layer adding interface to obtain a tree-shaped rendering structure comprising the video file and the special effect file.

In the above technical solution, the rendering module is further configured to perform traversal processing on the tree-like rendering structure to obtain rendering leaf nodes of the tree-like rendering structure;

when the rendering leaf node represents the video clip of the video file, rendering the video clip of the video file to obtain a rendering video corresponding to the video file;

when the rendering leaf node represents the special effect file, rendering processing is carried out on a video layer included in the special effect file to obtain a rendering video corresponding to the special effect file;

and combining the rendering video corresponding to the video file and the rendering video corresponding to the special effect file to obtain a rendered video file.

In the above technical solution, the rendering module is further configured to perform format conversion processing on a video clip of the video file to obtain a video layer corresponding to the video clip;

and performing animation rendering processing based on the video layer corresponding to the video clip to obtain a rendered video corresponding to the video file.

In the above technical solution, the rendering module is further configured to obtain a target video clip corresponding to the special effect file in the video file to be edited;

filling the target video clip into a placeholder map of a video layer included in the special effect file to obtain the video layer filled with the target video clip;

and performing animation rendering processing on the video layer filled with the target video fragment to obtain a rendered video corresponding to the special effect file.

filling the target video clip into the corresponding image layer to obtain the image layer filled with the target video clip;

filling the video layers included in the special effect file to the layers filled with the target video fragments to obtain the layers filled with the special effect file;

and performing animation rendering processing on the layer filled with the special effect file to obtain a rendered video corresponding to the special effect file.

In the above technical solution, before the rendering processing is performed on the tree-like rendering structure based on the video file to be edited and the video layer included in the at least one special effect file to obtain a rendered video file, the apparatus further includes:

and the decoding module is used for calling a decoding interface to decode the at least one special effect file and the video file to be edited to obtain the special effect file used for rendering and the video file to be edited.

In the above technical solution, the decoding module is further configured to, when a system supports hardware decoding, invoke a system decoding interface to perform decoding processing on the at least one special effect file and the video file to be edited, so as to obtain the special effect file used for the rendering processing and the video file to be edited;

when the system does not support hardware decoding, a decoding interface built in a software development kit is called to decode the at least one special effect file and the video file to be edited, and the special effect file used for rendering processing and the video file to be edited are obtained.

In the above technical solution, the synthesis module is further configured to perform segmentation processing on an audio file corresponding to the special effect file and an audio file corresponding to the video file to be edited to obtain a plurality of audio segments;

In the above technical solution, the synthesis module is further configured to decode the plurality of audio segments through a depacketizer to obtain decoded audio segments;

resampling the decoded audio segment to obtain the resampled audio segment;

In the technical scheme, the acquisition module is further used for displaying a plurality of candidate video files in a human-computer interaction interface;

In the above technical solution, the obtaining module is further configured to display a plurality of candidate special effect files in the human-computer interaction interface;

In the above technical solution, the obtaining module is further configured to perform matching processing on special effect files in a special effect file library based on the video content of the video file to be edited, and use the obtained special effect file as the candidate special effect file.

In the above technical solution, the obtaining module is further configured to obtain a frequency of selecting a special effect file in a special effect file library;

An embodiment of the present application provides an electronic device for video processing, where the electronic device includes:

a memory for storing executable instructions;

and the processor is used for realizing the video processing method provided by the embodiment of the application when executing the executable instructions stored in the memory.

The embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for processing video provided by the embodiment of the present application.

The embodiment of the application has the following beneficial effects:

the method comprises the steps that a tree-shaped rendering structure for rendering is obtained by performing tree-shaped editing processing on a video file to be edited and a special effect file, so that the special effect file is automatically organized through the tree-shaped rendering structure, and the efficiency of adding a special effect is improved; the tree-shaped rendering structure is rendered based on the video layer included by the special effect file, so that decoupling of a platform interface is achieved through the video layer, cross-platform application can be achieved, and flexibility of special effect addition is improved.

Drawings

Fig. 1 is an exploded view of a video post-editing template provided by the related art;

fig. 2A is a schematic diagram of a video picture adding special effects provided by the related art;

fig. 2B is a diagram illustrating a video frame with a special effect added according to the related art;

fig. 3 is a schematic application scenario of the distributed file system 10 provided in the embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device for video processing according to an embodiment of the present application;

fig. 5A-5C are schematic flow charts of a video processing method provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of video partitioning provided by an embodiment of the present application;

FIG. 7 is a diagram of a video publisher post-edit template provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of a video file with application specificity provided by an embodiment of the present application;

FIG. 9 is a schematic flow chart of a PAG-based video post-editing template rendering scheme according to an embodiment of the present application;

FIG. 10 is a schematic diagram of material division provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of a tree rendering structure provided by an embodiment of the present application;

FIG. 12 is a schematic view of a large sticker provided by an embodiment of the present application;

FIG. 13 is a schematic view of a small sticker provided by an embodiment of the present application;

FIG. 14 is a schematic diagram of audio track division provided by an embodiment of the present application;

fig. 15 is a schematic view of an audio mixing process flow provided in an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, references to the terms "first", "second", and the like are only used for distinguishing similar objects and do not denote a particular order or importance, but rather the terms "first", "second", and the like may be used interchangeably with the order of priority or the order in which they are expressed, where permissible, to enable embodiments of the present application described herein to be practiced otherwise than as specifically illustrated and described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) A client: and the terminal equipment is used for running application programs for providing various services, such as a video playing client, a game client and the like.

2) In response to: for indicating the condition or state on which the executed operation depends, when the condition or state is satisfied, the executed operation or operations may be in real time or have a set delay; there is no restriction on the order of execution of the operations performed unless otherwise specified.

3) Nonlinear special Effects creation software (AE, adobe After Effects): a graphic video processing software belongs to layer type later stage software, can efficiently and accurately create numerous attractive dynamic graphics and attractive visual effects, and covers character effects, particle effects, light effects, simulation effects, color matching techniques, high-grade effects and the like in the production of film and television effects.

4) Special effect files: and after the designer designs the animation special effect through the graphic video processing software, exporting a file in an animation format exported by the plug-in. For example, after a designer designs an animation by AE, an AE project file is exported as a file in animation format.

5) Portable Animated Graphics (PAG), portable Animated Graphics: the title of a realization scheme of a sticker animation is characterized in that the format of an animation file obtained based on a PAG scheme is a PAG format, the PAG animation file adopts a dynamic bit storage technology with extremely high compression rate, and resources such as pictures, sound, videos and the like can be directly integrated in a single file. In the PAG-based sticker animation scheme, a designer can design an animation special effect required by a product through AE software, then export a plug-in through PAGEExporter of the PAG scheme, read animation characteristic data in an AE engineering file, and select one of vector export, bitmap sequence frame export and video sequence frame export modes according to specific requirements to export a PAG format binary file, when a client displays, the client can decode the used PAG binary file through PAG SDK, render through a rendering module, and then respectively present on an Android platform, an iOS platform and a web side.

Compared with an Airbnb open-source Lottie animation implementation scheme, the PAG-based sticker animation scheme has the advantages that the development cost is greatly reduced, meanwhile, the same animation content is described, vector scheme files are smaller, animation special effects are supported to be richer, rendering multi-level cache drawing is supported, text editing and picture content replacement are supported, the bitmap sequence frame scheme and the video sequence frame scheme support an AE infinite special effect and are stronger in function, the bitmap sequence frame scheme is mainly used for a web end, and the video sequence frame scheme is mainly used for a moving end. In the PAG data structure, one PAGFile contains a plurality of compositions (Composition) which can be classified into three types: vectorComposition, bitmappcomposition, and VideoComposition correspond to three ways of exporting PAG files, respectively: vector derivation, bitmap sequence frame derivation, and video sequence frame derivation.

The vector derivation is restoration of an AE animation layer structure, in a specific representation of VectorComposition, a composition attribute (composition attributes) and a layer attribute may be included, the composition attributes are used to represent basic attributes of composition, such as ID, width, height, duration, frame rate, background color, and the like, layer information included in the layer attribute and the AE layer structure are in one-to-one correspondence, such as a virtual Object (Null Object), a real color layer (solid layer), a text layer (TextLayer), a shape layer (shape layer), a picture layer (ImageLayer), a pre-composition layer (pre-composition layer), and the like are supported in one composition, and the real color layer (solid layer) is further subdivided, and may include the width height, the color and the layer attribute of the layer, and the layer attribute includes the length of the layer, a start time, a stretch parameter, a Mask (Mask), a layer effect (Effects), a layer (Transform), a subdivision pattern, (Transform pattern), a subdivision coefficient, a Transform, a position, a Transform coefficient, a Transform coefficient, and a rotation coefficient, a corresponding to the layer structure.

Bitmap sequence frame derivation is to cut each frame of an AE animation into one picture, and a concrete representation of Bitmap Composition includes Composition attributes and a Bitmap sequence array, and the Bitmap sequence includes the width (width), height (height), frame rate (frameRate), whether it is a key frame (isKeyFrame), position information (x, y) of a Bitmap, and binary data (ByteData) of the Bitmap of derivation.

The video sequence frame is obtained by compressing the intercepted picture in a video format on the basis of the bitmap sequence frame, and the detailed representation of the video composition comprises composition attributes, whether transparent video (hasAlpha) and a video sequence array, and the video sequence array comprises binary data of the width, the height and the frame rate of a bitmap, whether the bitmap is a key frame or not and whether the bitmap is a bitmap.

6) Video layer: the difference point is that the common picture layer only needs to be replaced once, the picture layer takes effect in the whole rendering process, and the video layer needs to replace picture information frame by frame in the rendering process. Adding an animation special effect to a video, which can also be understood as adding video content to a sticker animation, when adding the video content, it is necessary to know at which time point and which position of the video content is added, and a video layer is served for this purpose, that is, the video layer is a layer in the animation file for bearing image content in the video file.

7) Video unpacking: video decapsulators are used to decapsulate video files, otherwise known as demultiplexing (demux). Each video file actually contains multiple tracks (tracks), the tracks of the images are the viewed pictures, the tracks of the audio are the heard sounds, and the tracks of the subtitles are the displayed subtitles, the resources of different tracks must be gathered together for the video file to be transmitted to the destination through a network or other methods, and the multiplexing (mux) process is to combine the resources of multiple tracks into a container, so that after the video file is obtained, demultiplexing is needed, that is, the multiple tracks in the video file are decomposed.

8) A decoder: in video transmission, the video data is encoded by the encoding method to reduce the packet size, so that after the video decoding is actually used, the image track data needs to be decoded by a video decoder.

9) FFmpeg (Fast Forward Mpeg): is a set of open source computer programs that can be used to record, convert digital audio, video, and convert them into streams. The FFmpeg embeds a video de-container and a decoder, and thus the video de-container and the video decoder can be implemented by the FFmpeg.

The video processing method is based on a video time axis and is driven by a video decoder at a platform end. As shown in fig. 1, a video post-editing template is composed of three sections of filling materials, transition effects are added to the materials, and besides a leader special effect and a trailer special effect, special effects such as a variable speed and a display Look-Up Table (LUT) filter are also added, and background music is required to be incorporated when the video is finally synthesized. The specific rendering process is as follows:

1) The video decoding is driven by a video decoder at the platform end and serves as a main time axis, the Android (Android) end realizes the video decoding through MediaCodec, and the iOS end realizes the video decoding through a videoToolbox.

2) After the decoding of the video is completed, special effects need to be added to the video picture, and there are two ways: mode 1, as shown in fig. 2A, if the size of the special effect picture is small, the special effect picture is directly rendered on a Surface (Surface) on which the video picture is drawn; mode 2, as shown in fig. 2B, when a video picture is used as an input to add a special effect, the video needs to be decoded at a specific time point, and rendered by special effect rendering, and the content of the special effect picture is obtained and then rendered, in this process, a large amount of filling service logic exists.

In the foregoing method, embodiments of the present application further provide a video processing method, an apparatus, an electronic device, and a computer-readable storage medium, which can improve flexibility and efficiency of adding a special effect.

The video processing method provided by the embodiment of the application can be independently realized by a terminal or a server; the method can also be implemented cooperatively by a terminal and a server, for example, the terminal solely undertakes a video processing method described below, or the terminal sends an addition request for a special effect file to the server, the server executes the video processing method according to the received addition request for the special effect file, and the tree-like rendering structure is rendered based on a video layer included in the special effect file and the video file to be edited, so as to implement a function of adding the special effect file in the video file to be edited.

The electronic device for video processing provided by the embodiment of the application can be various types of terminals or servers, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data, an artificial intelligence platform and the like; the terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart television, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

Referring to fig. 3, fig. 3 is a schematic view of an application scenario of the distributed file system 10 provided in the embodiment of the present application, where the terminal 200 is connected to the server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal (running a client, such as a video client, a short video client, etc.) may be used to obtain an addition request for a special effect file, for example, when a user opens the short video client running on the terminal, a video to be edited is selected from an album, at least one special effect template (special effect file) is selected from a special effect template set, and the terminal automatically obtains an addition request for the special effect file (including the video file to be edited and the at least one special effect file).

In some embodiments, a client running in the terminal 200 may have a video processing plug-in embedded therein to implement the video processing method locally at the client. After acquiring an addition request for a special effect file, the terminal 200 calls a video processing plug-in, performs tree editing processing based on the video file to be edited and at least one special effect file to obtain a tree rendering structure, and renders the tree rendering structure based on a video layer included in the special effect file and the video file to be edited to realize a function of adding the special effect file to the video file to be edited, and can also apply a target video file to which a special effect is added across platforms to respond to the addition request for the special effect file.

In some embodiments, after the terminal 200 obtains an addition request for a special effect file, a video processing interface of the server 100 is called, the server 100 performs tree editing processing based on a video file to be edited and at least one special effect file to obtain a tree rendering structure, renders the tree rendering structure based on a video layer included in the special effect file and the video file to be edited to obtain a target video file added with a special effect, and sends the target video file added with the special effect to the terminal 200, so as to implement a function of adding the special effect file in the video file to be edited, and can also apply the target video file added with the special effect across platforms to respond to the addition request for the special effect file.

In some embodiments, multiple servers may be grouped into a blockchain, and the server 100 is a node on the blockchain, and there may be an information connection between each node in the blockchain, and information transmission may be performed between the nodes through the information connection.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, and an application service layer. Data (for example, logic of graph data processing, images presented after aggregation) related to the graph data processing method provided by the embodiment of the present application may be stored in a block chain.

The following describes a structure of an electronic device for video processing provided in an embodiment of the present application, and referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device 500 for video processing provided in an embodiment of the present application, where the electronic device 500 is a terminal for example, and the electronic device 500 for video processing shown in fig. 4 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components of the connection. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 4.

The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in embodiments herein is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

in some embodiments, the video processing apparatus provided in the embodiments of the present application may be implemented in software, and the video processing apparatus provided in the embodiments of the present application may be provided in various software embodiments, including various forms of applications, software modules, scripts or codes.

Fig. 4 shows a video processing device 555 stored in the memory 550, which may be software in the form of programs and plug-ins, etc., and includes a series of modules including an acquisition module 5551, an editing module 5552, a rendering module 5553, a composition module 5554, and a decoding module 5555, which are logical and thus may be arbitrarily combined or further divided according to the functions implemented, and the functions of the respective modules will be described below.

As described above, the video processing method provided by the embodiment of the present application may be implemented by various types of electronic devices, such as a terminal. Referring to fig. 5A, fig. 5A is a schematic flowchart of a video processing method according to an embodiment of the present application, and is described with reference to the steps shown in fig. 5A.

In the following steps, the video file to be edited (i.e., the video file to be edited) may be a locally stored video (e.g., a video stored in an album) or may be a currently captured video (e.g., a video currently captured by a camera).

In step 101, a video file to be edited and at least one special effect file for adding a special effect to the video file to be edited are acquired.

As an example of obtaining the video file and the special effect file, after a user opens a video editing client running on a terminal, a video to be edited is selected from an album, and at least one special effect file is selected from a special effect template set, the terminal automatically obtains an addition request (including the video file to be edited and the at least one special effect file) for the special effect file, and sends the addition request for the special effect file to a server, and the server parses the addition request for the special effect file to obtain the video file to be edited and the at least one special effect file.

In some embodiments, before obtaining the video file to be edited and the at least one special effects file for adding a special effect to the video file to be edited, the method further comprises: displaying a plurality of candidate video files in a human-computer interaction interface; in response to the selection operation for the candidate video file, taking the selected part of the candidate video file as a video file to be edited; at least one special effect file for adding a special effect to a video file to be edited is acquired.

For example, a plurality of candidate video files are displayed in the human-computer interaction interface, the candidate video files may be video files in an album (used for adding special effects), the candidate video files selected by the user are determined as video files to be edited through the selection operation of the user on the candidate video files displayed in the human-computer interaction interface, and at least one special effect file used for adding special effects to the video files to be edited is determined based on the video files to be edited.

In some embodiments, obtaining at least one effects file for adding effects to a video file to be edited includes: displaying a plurality of candidate special effect files in a human-computer interaction interface; and in response to the selection operation for the candidate special effect file, taking the selected part of the candidate special effect file as at least one special effect file for adding a special effect to the video file to be edited.

For example, after a video file to be edited is determined, matching processing is performed on special effect files in a special effect file library through an artificial intelligence technology based on video content of the video file to be edited, the obtained special effect files are used as candidate special effect files, and a plurality of candidate special effect files are displayed in a human-computer interaction interface; or acquiring the frequency of the special effect files in the special effect file library selected by sample users (users except the current user in the client) or the current user, when the selected frequency is greater than a selection threshold (set according to actual requirements), taking the special effect files as candidate special effect files, and displaying a plurality of candidate special effect files in a human-computer interaction interface.

As an implementation manner, after a plurality of candidate special effect files are determined through an artificial intelligence technology or a selected frequency, styles of the plurality of candidate special effect files can be determined, so that a fixed number of different styles of special effect files are displayed in a human-computer interaction interface, for example, 10 candidate special effect files, 3 cartoon style special effect files, 3 antique special effect files and 4 modern style special effect files are determined, and 1 cartoon style special effect file, 1 antique special effect file and 1 modern style special effect file are displayed in the human-computer interaction interface.

As an implementation manner, after a plurality of candidate special effect files are determined through an artificial intelligence technology or a selected frequency, the heat degrees of the plurality of candidate special effect files can be determined, the plurality of candidate special effect files are subjected to descending sorting processing based on the heat degrees of the plurality of candidate special effect files, and a part of the candidate special effect files which are sorted in the front in the descending sorting mode is displayed in a human-computer interaction interface.

In step 102, a tree-like editing process is performed based on the video file to be edited and at least one special effect file, so as to obtain a tree-like rendering structure including the video file and the special effect file.

For example, after a video file to be edited and at least one special effect file are obtained, the video file to be edited and the at least one special effect file are subjected to tree editing through a PAG tree editing function to form rendering leaf nodes corresponding to the video file and the special effect file, and a tree rendering structure is constructed based on the rendering leaf nodes, so that the special effect file is automatically organized through the tree rendering structure to improve the efficiency of special effect addition, animation rendering is performed subsequently based on the tree rendering structure, and the rendered video file is obtained.

Referring to fig. 5B, fig. 5B is an alternative flowchart of a video processing method according to an embodiment of the present application, and fig. 5B shows that step 102 of fig. 5A can be further implemented by steps 1021-1022: in step 1021, the video file to be edited is divided based on at least one special effect file to obtain a plurality of video segments; in step 1022, the multiple video clips and the at least one special effect file are combined to obtain a tree-like rendering structure including the video file and the special effect file.

For example, the specific division processing procedure is as follows: acquiring time information of each special effect file, wherein the time information comprises an initial time and a duration; and determining a video clip corresponding to each special effect file in the video files to be edited and a video clip corresponding to video filling based on the starting time and duration of each special effect file, wherein the video clip corresponding to the video filling is a video clip without adding a special effect in the video files to be edited, and the video filling indicates that no special effect is added in the video files to be edited.

Taking advantage of the above example, determining a video clip corresponding to each special effect file and a video clip corresponding to video stuffing in a video file to be edited based on the start time and duration of each special effect file includes: performing the following processing for any one of the at least one special effects file: determining the starting time of any special effect file as the starting time of a video clip corresponding to any special effect file in the video files to be edited; determining the duration of any special effect file as the duration of a video clip corresponding to any special effect file in a video file to be edited; determining the ending time of any special effect file based on the starting time and the duration of any special effect file, and taking the ending time of any special effect file as the starting duration of a video clip filled with a corresponding video in a video file to be edited; and taking the time interval between the ending time of any special effect file and the starting time of the adjacent special effect file as the duration of the video clip filled by the corresponding video in the video file to be edited.

For example, as shown in fig. 6, for a special effect 3, the start time of the special effect 3 is taken as the start time of a video clip 1 (a video clip corresponding to the special effect 3), the duration of the special effect 3 is taken as the duration of the video clip 1, the end time of the special effect 3 is determined based on the start time and the duration of the special effect 3, the end time of the special effect 3 is taken as the start time of a video clip 2 (a video clip corresponding to video padding in a video file to be edited, that is, a video clip to which no special effect is added in the video file to be edited), and the time interval between the end time of the special effect 3 and the start time of the special effect 4 is taken as the duration of the video clip 2.

For example, the specific combined process is as follows: determining the rendering size of a video file to be edited based on a leader file and a trailer file in the special effect file; determining a rendering root node based on the rendering size of the video file to be edited; taking each video clip and each special effect file as rendering leaf nodes; and determining a tree-shaped rendering structure comprising the video file and the special effect file based on the rendering root node and the rendering leaf node.

Taking the above example, as shown in fig. 6, when the special effect 1 is a slice head special effect and the special effect 4 is a slice tail special effect, taking the number of video fragments between the special effects 1 and 4 as a rendering size of a video file to be edited, determining a rendering Root node (Root PAGComposition) based on the rendering size of the video file to be edited, creating an adding layer interface (addLayer interface) based on the rendering Root node, and sequentially adding rendering leaf nodes on the rendering Root node according to time information of the rendering leaf nodes by adding the layer interface to obtain a tree-like rendering structure including the video file and the special effect file.

In step 103, a tree-like rendering structure is rendered based on the video file to be edited and the video layers included in the at least one special effect file, so as to obtain a rendered video file.

For example, after the tree-like rendering structure is determined, the tree-like rendering structure is rendered based on a PAG video layer included in the special effect file and a video file to be edited to obtain a rendered video file, and since the PAG video layer can implement cross-platform decoding of a video, cross-platform application of the video file obtained after the PAG video layer is rendered can be implemented.

In some embodiments, rendering the tree-like rendering structure based on a video file to be edited and a video layer included in at least one special effect file to obtain a rendered video file, including: traversing the tree-shaped rendering structure to obtain rendering leaf nodes of the tree-shaped rendering structure; when the rendering leaf node represents the video clip of the video file, rendering the video clip of the video file to obtain a rendering video corresponding to the video file; when the rendering leaf node represents the special effect file, rendering processing is carried out on a video layer included in the special effect file to obtain a rendering video corresponding to the special effect file; and combining the rendering video corresponding to the video file and the rendering video corresponding to the special effect file to obtain a rendered video file.

For example, because a rendering leaf node in the tree-like rendering structure may be a special effect file or a video clip of a video file to be edited, when the rendering leaf node is the video clip of the video file to be edited, the video clip of the video file is rendered to obtain a rendering video corresponding to the video file; and when the rendering leaf node is a special effect file, rendering the video layer included in the special effect file to obtain a rendering video corresponding to the special effect file, and finally combining the two rendering videos according to the time information of the rendering leaf node to obtain a rendered video file so as to synthesize an audio file based on the rendered video file.

Receiving the above example, performing rendering processing on a video clip of a video file to obtain a rendered video corresponding to the video file includes: carrying out format conversion processing on the video segments of the video file to obtain video layers corresponding to the video segments; and performing animation rendering processing based on the video layer corresponding to the video clip to obtain a rendered video corresponding to the video file.

In connection with the above example, there are two ways to render the video layer included in the special effect file, one is a large sticker way, and the other is a small sticker way.

Wherein, the processing procedure of the large sticker mode is as follows: acquiring a target video clip corresponding to a special effect file in a video file to be edited; filling the target video clip into a placeholder map of a video layer included in the special effect file to obtain the video layer filled with the target video clip; and performing animation rendering processing on the video layer filled with the target video fragment to obtain a rendered video corresponding to the special effect file.

For example, a time slice (target video slice) is intercepted from a video to construct PAGMovie, the PAGMovie is filled in a placeholder map of a video layer to obtain the video layer filled with the target video slice, and then animation rendering processing is performed on the video layer filled with the target video slice to obtain a rendered video corresponding to the special effect file.

Wherein, the processing procedure of the small paster mode is as follows: acquiring a target video clip corresponding to a special effect file in a video file to be edited; filling the target video clip into the corresponding image layer to obtain the image layer filled with the target video clip; filling a video layer included in the special effect file to a layer filled with the target video fragment to obtain a layer filled with the special effect file; and performing animation rendering processing on the layer filled with the special effect file to obtain a rendered video corresponding to the special effect file.

For example, a time slice (target video slice) is intercepted from a video to construct PAGMovie, the constructed corresponding layer (PAGImageLayer) is filled, the constructed PAGImageLayer is added in Root PAGComposition, then a special effect file is added, and finally animation rendering processing is performed on the layer filled with the special effect file to obtain a rendered video corresponding to the special effect file.

In some embodiments, a tree-like rendering structure is rendered based on a video file to be edited and a video layer included in at least one special effect file, and before the rendered video file is obtained, a decoding interface is called to decode the at least one special effect file and the video file to be edited, so that a special effect file for rendering and a video file to be edited are obtained.

For example, when the system supports hardware decoding, a system decoding interface is called to decode at least one special effect file and a video file to be edited to obtain a special effect file for rendering and a video file to be edited; when the system does not support hardware decoding, a decoding interface built in the software development kit is called to decode at least one special effect file and the video file to be edited, and the special effect file for rendering and the video file to be edited are obtained.

In connection with the above example, some system platforms can support hardware decoding, and these system platforms provide hardware decoding interfaces, such as an iOS or mac interface VideoToolbox and an Android interface MediaCodec, and the hardware decoding can fully use an image processor for decoding, and the decoding efficiency and the required time consumption are low, so that when the system platform can support hardware decoding, the decoding interface of the system platform can be called preferentially. Therefore, in practical application, whether the current system supports hardware decoding or not can be determined, and when the current system supports hardware decoding, a system decoding interface is called to decode a video in a hardware decoding mode; when the current system does not support hardware decoding, such as a Linux system, a decoding interface built in the SDK can be called to decode the video in a software decoding mode.

In step 104, an audio synthesis process is performed based on the rendered video file to obtain a target video file with special effects added.

For example, when the special effect file includes an audio file (sound effect), audio synthesis processing is performed based on the special effect file including the audio file and the rendered video file, and a target video file with the sound effect incorporated therein is obtained.

Referring to fig. 5C, fig. 5C is an optional flowchart of a video processing method according to an embodiment of the present disclosure, and fig. 5C shows that step 104 of fig. 5A can be further implemented through steps 1041-1042: in step 1041, an audio file corresponding to the special effect file and an audio file corresponding to the video file to be edited are segmented to obtain a plurality of audio segments; in step 1042, dividing the plurality of audio segments into corresponding audio tracks to obtain a plurality of audio tracks added with audio segments, wherein the audio segments between the audio tracks are not overlapped; in step 1043, the multiple audio tracks with the added audio clips are merged to obtain the target video file with the added special effect.

For example, the audio file corresponding to the video file to be edited includes the original sound of the video file to be edited and the background music added to the video file to be edited. As shown in fig. 14, an audio file corresponding to the special effect file and an audio file corresponding to the video file to be edited are segmented to obtain a plurality of audio segments, such as an audio segment 1, an audio segment 2, an audio segment 3, and an audio segment 4, the audio segment 1 and the audio segment 2 are divided into corresponding audio tracks 1, the audio segment 3 and the audio segment 4 are divided into corresponding audio tracks 2, and audio tracks 1 and 2 to which audio segments are added are obtained, wherein the audio segments between the audio track 1 to which audio segments are added and the audio track 2 to which audio segments are added are not overlapped, and the audio track 1 to which audio segments are added and the audio track 2 to which audio segments are added are merged to obtain a target video file to which special effects are added.

In some embodiments, before dividing the plurality of audio segments into corresponding audio tracks, the method further comprises: decoding the plurality of audio clips through the depacketizer to obtain decoded audio clips; resampling the decoded audio segment to obtain a resampled audio segment; and carrying out audio adjustment processing on the resampled audio segments to obtain the audio segments for division processing.

For example, after completing the division of the audio track, since the formats of different audio files are different, after a single audio Segment (Segment) is decoded by a depacketizer, the audio Segment is resampled and converted into a uniform format, and then audio speed change or volume adjustment is performed on the audio Segment.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

In the related art, the implementation scheme of the video post-editing template rendering is driven by a video decoder at a platform end based on a video time axis. A video post-editing template is composed of three sections of filling materials, transition effects are added in the materials, besides a leader special effect and a trailer special effect, special effects such as variable speed and LUT (look up table) filters are also needed, and background music is needed to be incorporated when the video is finally synthesized.

Although the related art can satisfy the needs of products and users, the applicant has found the following problems:

1) The video coding and decoding module depends on a platform end interface, cross-platform cannot be realized, the template configuration depends on mobile phone end preview, desktop end preview is not supported, and the production efficiency is low;

2) In the rendering process, a large amount of service logics such as video speed change, video special effect addition, LUT filter effect addition and the like exist, and the platforms need to be respectively realized.

In order to solve the above problem, an embodiment of the present application provides a video processing method, where the control of the whole rendering chain is performed through PAG instead of a video decoder, so that the problem of dependence of video decoding on a platform end can be solved, service logics such as cross-platform SDK and video special effect addition can be implemented and then used at each end, and meanwhile, desktop end preview can be implemented, and the production efficiency of a template is improved.

As shown in fig. 7, the embodiment of the present application may be applied to various video publisher post-editing templates, where 701 is a special effect template, and after applying the special effect template shown in fig. 7, a video file with a special effect as shown in fig. 8 may be obtained, where 801 is a text special effect.

As shown in fig. 9, the following describes a video processing method provided in the embodiment of the present application in detail, and the implementation manner is as follows:

(1) PAG video layer and PAG tree editing

The video processing method provided by the embodiment of the application depends on PAG video image layers and PAG tree editing.

Aiming at the problem that video decoding depends on a platform end, the PAG video layer in the embodiment of the application realizes cross-platform decoding of videos, a de-container of the videos realizes decoding through FFmpeg, an iOS end realizes decoding through a VideoToolBox, an Android end realizes decoding through MediaCodec, and according to the diversity of types of Android ends, when hardware decoding fails, the FFmpeg is started to carry out software decoding, and meanwhile, time mapping is added to support variable speed control.

In the aspect of organization of a rendering chain, free combination of a plurality of PAG files is realized based on a PAG tree editing function, and PAG rendering trees can add the PAG files as rendering leaf nodes.

(2) User material partitioning

As shown in fig. 10, the dotted line portion divides 3 segments of user material into multiple segments for use by PAG SDK to form a tree-like rendering structure as shown in fig. 11.

As shown in fig. 10, a user may enter multiple segments of video material, and information such as a title, a special effect, a transition, a shift, a tail special effect, and an LUT filter may be added to the video material. In specific implementation, a user material is divided to obtain a minimum rendering node, a video fragment realizes cross-platform rendering based on a PAG video layer, organization of all rendering nodes depends on a PAG tree-shaped editing technology, and then a PAG rendering tree (tree-shaped rendering structure) is formed for rendering.

The whole special effect template is composed of a plurality of videos and a plurality of PAG special effects, minimum rendering nodes, namely leaf nodes in a tree-shaped rendering structure, are obtained through user material division, finally a PAG rendering tree is formed, and video fragments in the rendering leaf nodes are organized through PAG video layers.

(3) Specific addition

The specific rendering process for special effect addition is as follows:

1) According to the final rendering size, a Root PAGGomposition is created, using an interface: static std: shared _ ptr < PAGDCompsition > Make (int width, int height).

2) And adding divided rendering nodes in sequence, wherein the PAGComposition provides an addLayer interface, supports adding rendering nodes and supports setting a rendering level of a rendering layer.

The video picture is added with a leader, a trailer and a special effect, namely a large sticker and a small sticker. Under the condition of the large sticker, the size information of the large sticker is not smaller than the size of the video, the sticker comprises an occupation bitmap, the content of a video picture needs to be filled into the occupation bitmap of the sticker, as shown in fig. 12, a time slice is intercepted from the video to construct a PAGMevie, and the PAGMevie is filled into the occupation bitmap of a sticker animation, wherein regarding the transition special effect, the transition special effect essentially uses a large sticker scheme, and two video filling layers are provided; in the case of a small sticker, the size of the small sticker is not larger than the size of the video, position information of the small sticker relative to the video picture needs to be provided, then merging of the sticker and the video picture is performed, as shown in fig. 13, time slices are cut from the video to construct PAGMovie, the constructed PAGImageLayer is filled, the constructed PAGImageLayer is added in Root pagcomp, and then the small sticker is added.

Regarding the special variable speed effect, the PAG supports setting variable speed information, and in the actual use process, the time length of the video to be filled is intercepted and filled based on the required filling time length by acquiring the required filling time length of the placeholder so as to construct PAGMevie (video layer) for filling.

(4) LUT filter add special effects

After adding the special effect, in a video editing scene, a requirement of adding the LUT filter special effect may be met. In a specific implementation, C + + is used to implement the functionality of the LUT filter, allowing for cross-platform considerations. And after the rendering of the special effect video is finished, the texture obtained by the rendering is used as the input of the LUT filter realization module, and the texture output by the LUT filter realization module is finally rendered on the screen.

(5) Audio effect synthesis

In terms of audio processing, the input sources of audio include three parts, namely, sound effects (existing in special effects), background music and video audios, and in terms of data organization, as shown in fig. 14, an audio rendering chain is constructed according to the structure of tracks (Track) and segments (Segment) in AVFoundation, so that it is ensured that the audio data on each Track do not coincide.

After the division of the audio tracks is completed, the flow of the audio mixing process is as shown in fig. 15, because the formats of different audio files are different, after a single audio file (Segment) is decoded by a depacketizer, the audio file is resampled and converted into a uniform format, and then audio speed change or volume adjustment is performed on the audio file. In the audio track level, in addition to the consistency of the audio format during audio mixing, the audio length of the sampling interval also needs to be kept consistent, so that the audio data needs to be cached in the audio track level, and finally the audio sampling data is overlapped and mixed to obtain the audio file of the final video, so as to synthesize the audio file in the video added with special effects.

In summary, the video processing method provided by the embodiment of the present application has the following beneficial effects: the control of the whole rendering chain supports service logics such as video special effect addition and the like through PAG (PAG) instead of a video decoding module, the service logics such as video special effect addition and the like can be used at each end, desktop end preview can be realized, the development workload of the template is greatly reduced, the production efficiency of the template is improved, for example, the template is designed and completed at the desktop end, and the rendering at Android, iOS and Linux ends can be realized through SDK (software development kit), so that additional development is not needed.

The video processing method provided by the embodiment of the present application has been described so far with reference to the exemplary application and implementation of the electronic device provided by the embodiment of the present application, and the following continues to describe a scheme in which each module in the video processing apparatus 555 provided by the embodiment of the present application cooperates to implement video processing.

An obtaining module 5551, configured to obtain a video file to be edited and at least one special effect file for adding a special effect to the video file to be edited; the editing module 5552 is configured to perform tree editing processing based on the video file to be edited and the at least one special effect file, so as to obtain a tree rendering structure including the video file and the special effect file; a rendering module 5553, configured to perform rendering processing on the tree-like rendering structure based on the video file to be edited and the video layer included in the at least one special effect file, so as to obtain a rendered video file; a synthesizing module 5554, configured to perform audio synthesizing processing based on the rendered video file, to obtain a target video file added with an animation special effect.

In some embodiments, the editing module 5552 is further configured to divide the video file to be edited based on the at least one special effect file to obtain a plurality of video segments; and combining the plurality of video clips and the at least one special effect file to obtain a tree-shaped rendering structure comprising the video files and the special effect files.

In some embodiments, the editing module 5552 is further configured to obtain time information of each special effect file, where the time information includes a start time and a duration; and determining a video segment corresponding to each special effect file and a video segment corresponding to video filling in the video file to be edited based on the starting time and duration of each special effect file, wherein the video filling indicates that the special effect is not added in the video file to be edited.

In some embodiments, the editing module 5552 is further configured to perform the following for any of the at least one special effects file: determining the starting time of any special effect file as the starting time of a video clip corresponding to any special effect file in the video files to be edited; determining the duration of any special effect file as the duration of a video clip corresponding to any special effect file in the video files to be edited; determining the end time of any special effect file based on the start time and the duration of any special effect file, and taking the end time of any special effect file as the start duration of a video clip filled with a corresponding video in the video file to be edited; and taking the time interval between the ending time of any one special effect file and the starting time of the adjacent special effect file as the duration of the video segment filled with the corresponding video in the video file to be edited.

In some embodiments, the editing module 5552 is further configured to determine a rendering size of the video file to be edited based on a slice header file and a slice trailer file in the special effect file; determining a rendering root node based on the rendering size of the video file to be edited; taking each video clip and each special effect file as rendering leaf nodes; determining a tree-like rendering structure including the video file and the special effect file based on the rendering root node and the rendering leaf node.

In some embodiments, the editing module 5552 is further configured to create an add layer interface based on the render root node; and sequentially adding the rendering leaf nodes on the rendering root nodes according to the time information of the rendering leaf nodes through the adding layer interfaces to obtain a tree-shaped rendering structure comprising the video file and the special effect file.

In some embodiments, the rendering module 5553 is further configured to perform traversal processing on the tree-like rendering structure to obtain rendering leaf nodes of the tree-like rendering structure; when the rendering leaf node represents the video clip of the video file, rendering the video clip of the video file to obtain a rendering video corresponding to the video file; when the rendering leaf node represents the special effect file, rendering processing is carried out on a video layer included in the special effect file to obtain a rendering video corresponding to the special effect file; and combining the rendering video corresponding to the video file and the rendering video corresponding to the special effect file to obtain a rendered video file.

In some embodiments, the rendering module 5553 is further configured to perform format conversion processing on a video segment of the video file, so as to obtain a video layer corresponding to the video segment; and performing animation rendering processing based on the video layer corresponding to the video clip to obtain a rendered video corresponding to the video file.

In some embodiments, the rendering module 5553 is further configured to obtain a target video clip corresponding to the special effect file in the video file to be edited; filling the target video clip into a placeholder map of a video layer included in the special effect file to obtain the video layer filled with the target video clip; and performing animation rendering processing on the video layer filled with the target video fragment to obtain a rendered video corresponding to the special effect file.

In some embodiments, the rendering module 5553 is further configured to obtain a target video segment corresponding to the special effect file in the video file to be edited; filling the target video clip into the corresponding image layer to obtain the image layer filled with the target video clip; filling the video layers included in the special effect file to the layers filled with the target video fragments to obtain the layers filled with the special effect file; and performing animation rendering processing on the layer filled with the special effect file to obtain a rendered video corresponding to the special effect file.

In some embodiments, before the rendering processing is performed on the tree-like rendering structure based on the video layer included in the at least one special effect file and the video file to be edited, and a rendered video file is obtained, the apparatus further includes: a decoding module 5555, configured to invoke a decoding interface to perform decoding processing on the at least one special effect file and the video file to be edited, so as to obtain the special effect file used for the rendering processing and the video file to be edited.

In some embodiments, the decoding module 5555 is further configured to, when a system supports hardware decoding, invoke a system decoding interface to perform decoding processing on the at least one special effect file and the video file to be edited, so as to obtain the special effect file and the video file to be edited, which are used for the rendering processing; when the system does not support hardware decoding, a decoding interface built in a software development kit is called to decode the at least one special effect file and the video file to be edited, and the special effect file used for rendering processing and the video file to be edited are obtained.

In some embodiments, the synthesizing module 5554 is further configured to perform segmentation processing on an audio file corresponding to the special effect file and an audio file corresponding to the video file to be edited to obtain a plurality of audio segments; dividing the plurality of audio clips into corresponding audio tracks to obtain a plurality of audio tracks added with the audio clips, wherein the audio clips among the audio tracks are not overlapped; and merging the plurality of audio tracks added with the audio clips to obtain a target video file added with special effects.

In some embodiments, the synthesis module 5554 is further configured to perform decoding processing on the plurality of audio segments through a de-container to obtain the decoded audio segments; resampling the decoded audio segment to obtain the resampled audio segment; and carrying out audio adjustment processing on the re-sampled audio segments to obtain the audio segments for dividing processing.

In some embodiments, the obtaining module 5551 is further configured to display a plurality of candidate video files in a human-computer interaction interface; in response to the selection operation for the candidate video file, taking the selected part of the candidate video file as the video file to be edited; and acquiring at least one special effect file for adding a special effect to the video file to be edited.

In some embodiments, the obtaining module 5551 is further configured to display a plurality of candidate special effects files in the human-computer interaction interface; and in response to the selection operation of the candidate special effect file, taking the selected part of the candidate special effect file as the at least one special effect file for adding the special effect to the video file to be edited.

In some embodiments, the obtaining module 5551 is further configured to perform matching processing on special effect files in a special effect file library based on video content of the video file to be edited, and use a special effect file obtained through matching as the candidate special effect file.

In some embodiments, the obtaining module 5551 is further configured to obtain a frequency that a special effect file in the special effect file library is selected; and when the selected frequency is greater than the selected threshold value, taking the special effect file as the candidate special effect file.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the video processing method described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a video processing method provided by embodiments of the present application, for example, a video processing method as shown in fig. 5A-5C.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method of video processing, the method comprising:

2. The method according to claim 1, wherein performing tree editing processing based on the video file to be edited and the at least one special effect file to obtain a tree rendering structure including the video file and the special effect file comprises:

dividing the video file to be edited based on the at least one special effect file to obtain a plurality of video segments;

3. The method according to claim 2, wherein the dividing the video file to be edited based on the at least one special effect file to obtain a plurality of video segments comprises:

acquiring time information of each special effect file, wherein the time information comprises an initial time and a duration;

4. The method according to claim 3, wherein the determining the video clip corresponding to each of the special effects files and the video clip corresponding to the video stuffing in the video file to be edited based on the starting time and duration of each of the special effects files comprises:

performing the following for any of the at least one special effects file:

5. The method of claim 2, wherein the combining the plurality of video clips and the at least one special effect file to obtain a tree-like rendering structure including the video files and the special effect file comprises:

determining the rendering size of the video file to be edited based on a leader file and a trailer file in the special effect file;

taking each video clip and each special effect file as rendering leaf nodes;

6. The method of claim 5, wherein determining a tree-like rendering structure comprising the video file and the special effects file based on the rendering root node and the rendering leaf node comprises:

creating an adding layer interface based on the rendering root node;

and sequentially adding the rendering leaf nodes on the rendering root nodes according to the time information of the rendering leaf nodes through the adding layer interfaces to obtain a tree-shaped rendering structure comprising the video file and the special effect file.

7. The method according to claim 1, wherein the rendering the tree-like rendering structure based on the video file to be edited and the video layer included in the at least one special effect file to obtain a rendered video file comprises:

traversing the tree-shaped rendering structure to obtain rendering leaf nodes of the tree-shaped rendering structure;

8. The method of claim 7, wherein the rendering the video clip of the video file to obtain a rendered video corresponding to the video file comprises:

carrying out format conversion processing on the video clips of the video files to obtain video layers corresponding to the video clips;

9. The method according to claim 7, wherein the rendering the video layer included in the special effect file to obtain a rendered video corresponding to the special effect file comprises:

acquiring a target video clip corresponding to the special effect file in the video file to be edited;

10. The method according to claim 7, wherein the rendering the video layer included in the special effect file to obtain a rendered video corresponding to the special effect file comprises:

filling the target video clip into a corresponding layer to obtain the layer filled with the target video clip;

11. The method according to claim 1, wherein before the rendering the tree-like rendering structure based on the video file to be edited and the video layer included in the at least one special effect file to obtain the rendered video file, the method further comprises:

and calling a decoding interface to decode the at least one special effect file and the video file to be edited to obtain the special effect file used for rendering and the video file to be edited.

12. The method according to claim 11, wherein the invoking a decoding interface to decode the at least one special effect file and the video file to be edited to obtain the special effect file for the rendering process and the video file to be edited includes:

when the system supports hardware decoding, a system decoding interface is called to decode the at least one special effect file and the video file to be edited to obtain the special effect file used for rendering and the video file to be edited;

13. A video processing apparatus, characterized in that the apparatus comprises:

the editing module is used for performing tree-shaped editing processing on the video file to be edited and the at least one special effect file to obtain a tree-shaped rendering structure comprising the video file and the special effect file;

the rendering module is used for rendering the tree-shaped rendering structure based on the video file to be edited and the video layer included by the at least one special effect file to obtain a rendered video file;

14. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the video processing method of any of claims 1 to 12 when executing executable instructions stored in the memory.

15. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the video processing method of any of claims 1 to 12.