CN111193876A

CN111193876A - Method and device for adding special effect in video

Info

Publication number: CN111193876A
Application number: CN202010019167.8A
Authority: CN
Inventors: 齐国鹏; 陈仁健; 傅彬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2020-05-22
Anticipated expiration: 2040-01-08
Also published as: CN111193876B

Abstract

The invention provides a method and a device for adding special effects in a video; the method comprises the following steps: acquiring a video file and an animation file for adding at least one special effect to the video file; decoding the video file to obtain frame data of a plurality of video frames of the video file, and decoding the animation file to obtain frame data of a plurality of animation frames of the animation file, wherein the video frames and the animation frames have a one-to-one correspondence relationship; simulating a graphics processor and running the simulated graphics processor; performing animation rendering through the simulated graphics processor according to the frame data of each video frame and the frame data of the corresponding animation frame respectively to obtain a plurality of target video frames; and performing video synthesis based on the plurality of target video frames to obtain a target video file added with the at least one special effect. The method and the device can solve the problem of unsmooth animation rendering of the low-end mobile phone, and realize the mass production of special-effect videos.

Description

Method and device for adding special effect in video

Technical Field

The invention relates to a video processing technology, in particular to a method and a device for adding special effects in a video.

Background

With the rapid development of the short video industry, the demand for mass production of special-effect videos on a large scale becomes stronger and stronger.

In the related art, the Lottie scheme of Airbnb open source can implement animation design to a workflow presented by a terminal, that is, a designer designs an animation through graphics and video processing Software, for example, designs the animation through ae (adobe After effects), the designed animation is exported through an export plug-in, and the animation is loaded and rendered through a Software Development Kit (SDK) at the terminal, however, because the performance of a mobile phone used by some users is relatively weak, if the animation rendering of a video, that is, an operation of adding a special effect in the video is executed at the terminal, the problem of hitching and the like exists.

Disclosure of Invention

The embodiment of the invention provides a method and a device for adding a special effect into a video, which can avoid the blockage caused by animation rendering by a terminal.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a method for adding special effects in a video, which comprises the following steps:

acquiring a video file and an animation file for adding at least one special effect to the video file;

decoding the video file to obtain frame data of a plurality of video frames of the video file, and decoding the animation file to obtain frame data of a plurality of animation frames of the animation file, wherein the video frames and the animation frames have a one-to-one correspondence relationship;

simulating a graphics processor and running the simulated graphics processor;

performing animation rendering through the simulated graphics processor according to the frame data of each video frame and the frame data of the corresponding animation frame respectively to obtain a plurality of target video frames;

and performing video synthesis based on the plurality of target video frames to obtain a target video file added with the at least one special effect.

The embodiment of the invention provides a device for adding special effects in a video, which comprises:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a video file and adding at least one special-effect animation file for the video file;

the decoding module is used for decoding the video file to obtain frame data of a plurality of video frames of the video file, and decoding the animation file to obtain frame data of a plurality of animation frames of the animation file, wherein the video frames and the animation frames have one-to-one correspondence;

the simulation module is used for simulating a graphics processor and operating the simulated graphics processor;

the rendering module is used for performing animation rendering through the simulated graphics processor according to the frame data of each video frame and the frame data of the corresponding animation frame respectively to obtain a plurality of target video frames;

and the synthesis module is used for carrying out video synthesis on the basis of the plurality of target video frames to obtain a target video file added with the at least one special effect.

In the above scheme, the decoding module is further configured to decapsulate the video file to obtain video stream data in the video file;

decoding the video stream data to obtain frame data of a plurality of video frames;

and carrying out format conversion on the frame data of the plurality of video frames to obtain the frame data in a renderable data format.

In the foregoing solution, the decoding module is further configured to convert a data format of frame data of the plurality of video frames from a luminance-chrominance YUV data format to a red-green-blue-transparency RGBA data format.

In the above scheme, the decoding module is further configured to decode the animation file to obtain frame data of a key frame and frame data of a plurality of non-key frames of the animation file;

the frame data of the key frame is bitmap data, and the frame data of the non-key frame is difference bitmap data relative to the key frame.

In the above solution, the rendering module is further configured to simulate an environment of an open graphics library;

creating an environment of a two-dimensional graphic library based on the environment of the simulated open graphic library;

and calling a graphics drawing interface through the simulated graphics processor based on the environment of the two-dimensional graphics library, and respectively performing animation rendering according to the frame data of each video frame and the frame data of the corresponding animation frame to obtain a plurality of rendered target video frames.

In the foregoing solution, the rendering module is further configured to perform the following operations on frame data of each video frame and frame data of a corresponding animation frame:

drawing a first graph corresponding to the video frame in a blank canvas through the simulated graphics processor based on frame data of the video frame;

acquiring first position information of an animation frame corresponding to the video frame;

and drawing a second graph corresponding to the animation frame on the canvas on which the first graph is drawn at a position corresponding to the first position information through the simulated graph processor based on the frame data of the animation frame to obtain the canvas bearing the graph corresponding to the target video frame.

In the foregoing solution, the rendering module is further configured to perform the following operations on frame data of each video frame and frame data of a corresponding animation frame respectively:

drawing a third graph corresponding to the animation frame in a blank canvas through the simulated graphics processor based on the frame data of the animation frame;

acquiring second position information of a video frame corresponding to the animation frame;

and drawing a fourth graph corresponding to the video frame at a position corresponding to the second position information on the canvas on which the third graph is drawn by the simulated graph processor based on the frame data of the video frame to obtain the canvas bearing the graph corresponding to the target video frame.

In the above scheme, the apparatus further comprises:

the screen capture module is used for carrying out screen capture processing on a plurality of canvas obtained by rendering to obtain a plurality of target video frames;

and carrying out format conversion on the plurality of target video frames to obtain a plurality of target video frames in a target data format.

In the above scheme, the synthesizing module is further configured to encode and encapsulate the plurality of target video frames to obtain a target video file to which the at least one special effect is added.

An embodiment of the present invention provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the method for adding the special effect in the video provided by the embodiment of the invention when the executable instruction stored in the memory is executed.

The embodiment of the invention provides a storage medium, which stores executable instructions and is used for causing a processor to execute the executable instructions so as to realize the method for adding the special effect in the video provided by the embodiment of the invention.

The embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the server carries out animation rendering through the simulated graphics processor according to the frame data of each video frame and the frame data of the corresponding animation frame respectively by simulating the graphics processor and operating the simulated graphics processor so as to obtain a plurality of rendered target video frames, so that the animation rendering aiming at the video can be completed at the server end, and a target video file added with special effects is obtained for the terminal to directly decode and play, thereby avoiding the pause caused by the animation rendering of the terminal; meanwhile, the server can directly acquire the animation file from the animation design end, the special effect addition aiming at the video is automatically realized, the processing efficiency of adding the special effect in the video is improved, and further the special effect video can be produced in batches.

Drawings

Fig. 1 is a schematic block diagram of a system 100 for adding special effects to video according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention;

fig. 3 is a flowchart illustrating a method for adding a special effect to a video according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating rendering results provided by an embodiment of the invention;

FIG. 5 is a diagram illustrating rendering results provided by an embodiment of the invention;

FIG. 6 is a schematic interface diagram for playing a target video file according to an embodiment of the present invention;

fig. 7 is a flowchart illustrating a method for adding a special effect to a video according to an embodiment of the present invention;

fig. 8 is a flowchart illustrating a method for adding special effects to a video according to an embodiment of the present invention;

FIG. 9 is a flow chart of a rendering process provided by an embodiment of the invention;

fig. 10 is a schematic structural diagram of a device for adding a special effect to a video according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, to enable embodiments of the invention described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) AE, short for Adobe After Effect, is a powerful graphic video processing software.

2) And (3) an animation file, wherein after the designer designs the animation through the graphic video processing software, the file in the animation format is exported through the export plug-in, and for example, after the designer designs the animation through AE, the AE engineering file is exported to the file in the animation format.

3) The graphics processor, also called a display core, a visual processor, and a display chip, is a microprocessor dedicated to image and graphics related operations on personal computers, workstations, game machines, and some mobile devices (e.g., tablet computers, smart phones, etc.).

4) YUV, which is a color coding method, represents brightness (Luma) and gray scale values by "Y", and represents Chroma (Chroma) by "U" and "V", which are used to describe the color and saturation of an image, and to specify the color of a pixel.

5) RGBA, which represents the color spaces of Red (Red), Green (Green), Blue (Blue) and Alpha, i.e., transparency/opacity.

6) A bitmap, also called a dot matrix image or a raster image, is composed of individual dots called pixels (picture elements) which can be arranged and colored differently to constitute a pattern.

Referring to fig. 1, fig. 1 is an architectural diagram of a system 100 for adding special effects to a video according to an embodiment of the present invention, in order to support an exemplary application, a terminal 400 is connected to a server 200 (including a server 200-1 and a server 200-2) through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 400 is configured to send a request for adding a special effect to a video, where the request includes a video file to which the special effect is to be added;

the server 200-1 is used for acquiring the video file from the request and acquiring an animation file for adding at least one special effect to the video file from the server 200-2 based on the request; decoding the video file to obtain frame data of a plurality of video frames of the video file, and decoding the animation file to obtain frame data of a plurality of animation frames of the animation file, wherein the video frames and the animation frames have a one-to-one correspondence relationship; simulating a graphics processor and running the simulated graphics processor; animation rendering is carried out through a simulated graphics processor according to the frame data of each video frame and the frame data of the corresponding animation frame respectively to obtain a plurality of target video frames; performing video synthesis based on a plurality of target video frames to obtain a target video file added with at least one special effect; and transmits the target video file to the terminal 400;

the terminal 400 is further configured to play the target video file;

and a server 200-2 for storing the animation file.

In actual implementation, after the designer designs an animation including at least one special effect through the AE, the AE engineering file is exported to an animation file and stored in the server 200-2.

In practical application, the server may be a server configured independently to support various services, or may be configured as a server cluster; the terminal may be a smartphone, a tablet, a laptop, or any other type of user terminal, and may also be a wearable computing device, a Personal Digital Assistant (PDA), a desktop computer, a cellular phone, a media player, a navigation device, a game console, a television, or a combination of any two or more of these or other data processing devices.

Next, an electronic device implementing the method for adding a special effect to a video according to an embodiment of the present invention will be described. Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device shown in fig. 2 includes: a processor 210, a memory 250, a network interface 220, and a user interface 230. The various components in the electronic device are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.

The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments of the invention is intended to comprise any suitable type of memory.

In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a display module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;

an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.

In some embodiments, the apparatus for adding special effects to video provided by the embodiments of the present invention may be implemented in software, and fig. 2 illustrates an apparatus 255 for adding special effects to video stored in a memory 250, which may be software in the form of programs and plug-ins, and includes the following software modules: the obtaining module 2551, the decoding module 2552, the simulation module 2553, the rendering module 2554 and the composition module 2555 are logical and thus can be arbitrarily combined or further split according to the implemented functions, which will be described below.

In other embodiments, the apparatus for adding special effects to video provided by the embodiments of the present invention may be implemented in hardware, and as an example, the apparatus for adding special effects to video provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the method for adding special effects to video provided by the embodiments of the present invention, for example, the processor in the form of a hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

Based on the above description of the system and the electronic device for adding a special effect to a video according to the embodiments of the present invention, a method for adding a special effect to a video according to an embodiment of the present invention is described next, fig. 3 is a flowchart illustrating the method for adding a special effect to a video according to an embodiment of the present invention, and in some embodiments, the method for adding a special effect to a video is implemented by a server, for example, by the server 200-1 in fig. 1, and the method for adding a special effect to a video according to an embodiment of the present invention is described with reference to fig. 1 and fig. 3.

Step 301: the method comprises the steps of obtaining a video file and an animation file used for adding at least one special effect to the video file.

The video file is the presented main content, and the animation file is an animation format file exported by a designer after designing an animation through the graphic video processing software, and is used for adding at least one special effect to the main content. For example, the video file can be a highlight moment video clip of a user in a game, and the animation file is used for adding a character special effect or a map special effect to the highlight moment video clip; or the video file can also be a highlight moment video clip of the user in the game of one week, and the animation file is used for adding a character special effect or a map special effect to the highlight moment video clip.

In practical application, the video file may be sent to the server by the terminal, that is, the terminal sends a request for adding a special effect to the video file to the server, where the request carries the video file, and the request in the server obtains the video file. For example, the terminal records a video in the game process, and after the game is finished, the terminal sends a video file obtained by recording to the server, so that the server adds a special effect to the video file. The video file may be pre-stored in a database of the server, and when receiving an instruction to add a special effect, the server obtains the video file from the database.

Correspondingly, the animation file can be stored in the server corresponding to the graphic video processing software, that is, after the designer designs the animation through the graphic video software, the derived animation format file is stored in the server corresponding to the graphic video processing software. And after receiving the command of adding the special effect, the server for adding the special effect acquires an animation file for adding at least one special effect to the video file from the server corresponding to the graphic video processing software. The animation file can be stored in a server for adding special effects, and then when the server receives an instruction for adding special effects, the server directly obtains the corresponding animation special effects from a database of the server.

Step 302: and decoding the video file to obtain frame data of a plurality of video frames of the video file, and decoding the animation file to obtain frame data of a plurality of animation frames of the animation file.

Here, the video frame and the animation frame are in a one-to-one correspondence relationship.

In actual implementation, the server may implement decoding of the video file based on FFmpeg. Here, FFmpeg is a set of open source computer programs for recording, converting digital audio-video and video, and converting them into streams, and can be run on platforms such as Linux, Windows, Mac OS X, Android, iOS, and the like.

In some embodiments, the server may decode the video file by: decapsulating the video file to obtain video stream data in the video file; decoding video stream data to obtain frame data of a plurality of video frames; and carrying out format conversion on the frame data of the plurality of video frames to obtain the frame data in a renderable data format.

In practical implementation, the server may perform decapsulation on the video file through FFmpeg, where the av _ seek _ frame provided by FFmpeg may accurately obtain video data of a characteristic time or audio data of a characteristic time. The video data in the video file can be unpacked through FFmpeg to obtain the video data with the data type of AVPacket.

In some embodiments, when the audio file is included in the video file, decapsulating the video file also results in audio stream data. Then, the audio data in the video file can be unpackaged through FFmpeg, and the audio data with the data type of AVPacket is obtained.

In other embodiments, the server may additionally obtain an audio file, and then decapsulate the obtained audio file to obtain audio stream data. In practical implementation, the audio data in the audio file may also be decapsulated by FFmpeg to obtain the audio data with the data type of AVPacket.

It should be noted that, either the audio stream data in the audio file or the audio stream data in the video file may be added to the target video file when the target video file is generated.

In some embodiments, the server may format frame data of the plurality of video frames by: and converting the data format of the frame data of the plurality of video frames from the YUV data format to the RGBA data format.

Here, frame data of the decoded video frame is in YUV format, and since the frame data YUV format is bare data of uncompressed coding of the video, rendering cannot be directly performed, and in actual implementation, the YUV format data needs to be converted into RGBA format data.

In some embodiments, the animation file may be a PAG animation file, which is an animation file in a self-developed binary file format. In actual implementation, the PAG binary file is deserialized into data objects that the server can manipulate, and the decoded data structure can be referenced to the PAG data structure.

In practical application, after the designer designs the animation through the graphic video processing software, the animation engineering file is exported to be a PAG animation file, wherein the PAG animation file has three export modes, including: vector export, bitmap sequence frame export and video sequence frame export, wherein the data formats corresponding to files exported in different export modes are different.

It should be noted that the vector derivation is a restoration of the animation layer structure, including information of each layer; the bitmap sequence frame export is to intercept each frame of the animation into a picture and store bitmap data corresponding to each picture; the video sequence frame is optimized to the bitmap sequence frame, namely, the intercepted picture is compressed in a video format.

In some embodiments, when the PAG animation file is exported in a bitmap sequence frame export manner, the server may decode the animation file by: decoding the animation file to obtain frame data of a key frame and frame data of a plurality of non-key frames of the animation file; the frame data of the key frame is bitmap data, and the frame data of the plurality of non-key frames is difference bitmap data relative to the key frame.

Here, because most animations have the characteristics of continuity and small inter-frame difference, a certain frame is selected as a key frame, and each frame of data is compared with the key frame to obtain position information and width and height data of a difference bitmap, and the difference bitmap information is intercepted and stored, so that the file size can be reduced. When animation rendering is performed, the graphics corresponding to the non-key frames can be obtained through region rendering based on the rendered graphics corresponding to the key frames.

Step 303: simulating the graphics processor, and running the simulated graphics processor.

Here, since the animation rendering part of the implementation involves the rendering of a Graphics Processing Unit (GPU), and most of the server sides do not contain a GPU at present, a software simulation GPU environment needs to be built on a Central Processing Unit (CPU).

In practical implementation, the Mesa scheme or the SwiftShader scheme proposed by Google may be used. Wherein, the performance of the Swiftshader simulation GPU is superior to that of Mesa.

Step 304: and animation rendering is carried out through a simulated graphics processor according to the frame data of each video frame and the frame data of the corresponding animation frame respectively so as to obtain a plurality of target video frames.

Because the rendering of the Lottie scheme in the related art is realized based on the platform rendering API, for example, the Android platform is based on Canvas, and the iOS platform is based on CoreGraphics, the Lottie scheme does not support server-side rendering. In actual implementation, animation rendering is implemented based on the interface provided by the SkSurface class provided by Skia. Skia is a 2D vector graphics processing function library, and comprises fonts, coordinate transformation and a dot-matrix diagram which have high-efficiency and concise performances and can realize platform-independent drawing.

In some embodiments, the server simulates the environment of an open graphics library; creating an environment of a two-dimensional graphic library based on the environment of the simulated open graphic library; based on the environment of the two-dimensional graphics library, a graphics drawing interface is called through a simulated graphics processor, animation rendering is carried out according to frame data of each video frame and frame data of corresponding animation frames respectively, and a plurality of rendered target video frames are obtained.

In practical implementation, the open graphics library may be OpenGL, which is a cross-language, cross-platform application programming interface for rendering 2D, 3D vector graphics; the two-dimensional graphics library is Skia.

Because the rendering of the Lottie scheme in the related art is realized based on the platform rendering API, for example, the Android platform is based on Canvas, and the iOS platform is based on CoreGraphics, the Lottie scheme does not support server-side rendering. In actual implementation, animation rendering is implemented based on the interface provided by the SkSurface class provided by Skia.

In actual implementation, the server simulates an OpenGL environment through the SwiftShader, namely two dynamic libraries, namely libegl.so and libglesv2.so, are compiled through source codes of the SwiftShader, an API interface of the dynamic libraries is similar to an OpenGL interface, and an OpenGL environment with off-screen rendering can be built by referring to an OpenGL initialization mode.

Then, a GPU context for Skia is created. According to the API provided by Skia, creating Ski _ sp < const GrGLInterface > glInterface, and then creating a GPU context of Skia: sk _ sp < GrContext > terContext ═ GrContext:: MakeGL (glinterface.

Since related rendering of the Skia is performed in the SkSurface, the SkSurface needs to be created according to an interface provided by the Skia after the GPU context of the Skia is created. Then, acquiring corresponding Skcanvas through the SkSurface; and performing animation rendering according to the frame data of the current video frame and the frame data of the corresponding animation frame through Skcanvas.

And performing animation rendering according to the frame data of each video frame and the frame data of the corresponding animation frame in sequence until all data are rendered.

In actual implementation, the SkSurface may be created by:

static sk_sp<SkSurface>MakeFromBackendTexture(GrContext*context,

const GrBackendTexture&backendTexture,

GrSurfaceOrigin origin,int sampleCnt,

SkColorType colorType,

sk_sp<SkColorSpace>colorSpace,

const SkSurfaceProps*surfaceProps)；

in some embodiments, the following is performed for the frame data of each video frame and the frame data of the corresponding animation frame: drawing a first graph corresponding to the video frame in the blank canvas through a simulated graph processor based on frame data of the video frame; acquiring first position information of an animation frame corresponding to a video frame; and drawing a second graph corresponding to the animation frame at a position corresponding to the first position information on the canvas on which the first graph is drawn by a simulated graph processor based on the frame data of the animation frame to obtain the canvas bearing the graph corresponding to the target video frame.

In practical implementation, the frame data of the video frame may be rendered first, and then the frame data of the animation frame may be rendered, so as to overlay the graphics corresponding to the animation frame on the graphics corresponding to the video frame. For example, fig. 5 is a schematic diagram of a rendering result provided by an embodiment of the present invention, and referring to fig. 4, a graphic corresponding to an animation frame is a cartoon character, and the graphic corresponding to the animation frame is overlaid on a graphic corresponding to a video frame.

In some embodiments, the following is performed for the frame data of each video frame and the frame data of the corresponding animation frame: drawing a third graph corresponding to the animation frame in the blank canvas through a simulated graph processor based on the frame data of the animation frame; acquiring second position information of a video frame corresponding to the animation frame; and based on frame data of the video frame, drawing a fourth graph corresponding to the video frame at a position corresponding to the second position information on the canvas on which the third graph is drawn by the simulated graphics processor to obtain the canvas bearing the graph corresponding to the target video frame.

In practical implementation, the frame data of the animation frame may be rendered first, and then the frame data of the video frame may be rendered, so as to overlay the graphics corresponding to the animation frame on the graphics corresponding to the video frame. For example, referring to fig. 5, a graph corresponding to an animation frame is a frame, and the graph corresponding to the video frame is overlaid on the graph corresponding to the animation frame.

In some embodiments, when the rendered result is a canvas carrying a graphic corresponding to the target video frame, the server may further perform screen capturing on a plurality of canvases obtained by rendering to obtain a plurality of target video frames; and carrying out format conversion on the plurality of target video frames to obtain a plurality of target video frames in a target data format.

After rendering, screen capture processing is carried out on rendered content to obtain a plurality of target video frames; and then converting the data format of the target video frame into a bitmap, and obtaining the target video frame in the RGBA format according to the bitmap.

In practical implementation, the screen capture process can be implemented through the following interfaces in Skia: sk _ sp < SkImage > makeImageSnapshot ();

then, the data format of the target video frame is converted into a bitmap (SkBitmap) by:

SkBitmap bitmap；

bitmap.allocN32Pixels(skImage->width(),skImage->height(),false)；

skImage->readPixels(bitmap.info(),bitmap.getPixels(),bitmap.rowBytes(),0,0)；

bitmap.setImmutable()；

step 305: and performing video synthesis based on the plurality of target video frames to obtain a target video file added with at least one special effect.

In some embodiments, the server may perform video compositing by: and coding and packaging the plurality of target video frames to obtain a target video file added with at least one special effect.

In some embodiments, the server may encode and encapsulate the plurality of target video frames by: converting a plurality of target video frames from an RGBA format to a YUV format; coding a plurality of target video frames in YUV format to obtain video stream data; and encapsulating the video stream data to obtain a target video file.

In practical implementation, the server converts a target video frame from an RGBA format to a YUV format through a libswscale module provided by the FFmpeg, and the converted data structure can be an AVFrame class of the FFmpeg; a libcodec module based on FFmpeg compresses a plurality of target video frames in YUV format into video stream data such as x264 and MPEG 4; the video stream data is packaged into a container, and the container is packaged into a video file in a video format such as MP4, MOV, AVI and the like.

It should be noted that, if the video file includes audio stream data or otherwise acquires audio stream data, the audio stream data is unpacked from the video file or the audio file, and when the video stream data and the audio stream data are packaged, the video stream data and the audio stream data need to be packaged together in a container.

In some embodiments, after obtaining the target video file, the server may output the target video to the terminal, so that the terminal plays the target video file.

For example, fig. 6 is a schematic interface diagram of playing a target video file according to an embodiment of the present invention, and referring to fig. 6, a character map special effect and a character special effect are added to a highlight video of a game, and audio stream data in an audio file "XXX sound" is added to the target video file when synthesizing the video.

According to the embodiment of the invention, the simulated graphics processor is operated, animation rendering is carried out through the simulated graphics processor according to the frame data of each video frame and the frame data of the corresponding animation frame respectively to obtain a plurality of rendered target video frames, so that the rendering of the animation can be realized at the server side, and on one hand, the problems of jamming and the like in the animation rendering carried out at the decoding terminal can be solved; on the other hand, mass production of special effect videos can be achieved.

The following describes a method for adding a special effect to a video in an embodiment of the present invention, taking an example of adding a relevant special effect to a game video of a user. Fig. 7 is a schematic flowchart of a method for adding a special effect to a video according to an embodiment of the present invention, and referring to fig. 7, the method for adding a special effect to a video according to an embodiment of the present invention includes:

step 401: the client records the game video to generate a game video file.

In actual implementation, the game client receives the recording operation of the user, records the game video in the game process, and generates a game video file.

Step 402: the client sends a special effect adding request carrying the game video file to the first server.

Step 403: the first server acquires the game video file based on the special effect adding request and sends the acquisition request of the corresponding animation file to the first server.

Here, the animation file is used to add at least one special effect to the subject content.

Step 404: the second server sends the animation file to the first server.

Here, the animation file is a PAG animation file that is exported in a bitmap sequence frame export manner after the designer designs an animation by AE.

Step 405: the first server unpacks the game video file through the Ffmpeg to obtain video stream data and audio stream data.

In practical implementation, the av _ seek _ frame provided by the FFmpeg can accurately acquire the video data at the characteristic time and the audio data at the characteristic time, and the video data and the audio data in the video file can be unpacked through the FFmpeg to obtain the video stream data and the audio stream data with the data types of avpackets.

Step 406: the first server decodes the video stream data to obtain frame data of a plurality of video frames in a YUV data format.

Step 407: the first server performs format conversion on the frame data of the plurality of video frames to obtain the frame data of the plurality of video frames in the RGBA data format.

Step 408: the first server decodes the animation file to obtain frame data corresponding to the plurality of animation frames.

In actual implementation, decoding the animation file to obtain frame data of a key frame and frame data of a plurality of non-key frames of the animation file; the frame data of the key frame is bitmap data, and the frame data of the plurality of non-key frames is difference bitmap data relative to the key frame.

Step 409: the first server simulates a graphics processor and runs the simulated graphics processor.

Step 410: the first server draws a first graph corresponding to each video frame in the blank canvas through the simulated graphics processor based on the frame data of the video frame aiming at the frame data of each video frame.

In actual implementation, animation rendering may be implemented through an interface provided by the SkSurface class provided by Skia.

Step 411: the first server acquires the position information of the animation frame corresponding to the video frame.

Step 412: and the first server draws a second graph of the animation frame at a position corresponding to the position information on the canvas on which the first graph is drawn through the simulated graph processor based on the frame data of the animation frame to obtain the canvas bearing the graph corresponding to the target video frame.

Step 413: the first server conducts screen capture processing on the plurality of canvas obtained through rendering, and a plurality of target video frames are obtained.

Step 414: and the first server performs format conversion on the plurality of target video frames to obtain a plurality of target video frames in YUV format.

Step 415: the first server encodes a plurality of target video frames into target video stream data.

Step 416: the first server encapsulates the target video stream data and the audio stream data into a target video file in MP4 format.

Step 417: the first server outputs the target video file to the game client.

Step 418: and the client plays the target video file.

In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described. Fig. 8 is a schematic flowchart of a method for adding a special effect to a video according to an embodiment of the present invention, and referring to fig. 8, the method for adding a special effect to a video according to an embodiment of the present invention includes:

step 501: the server obtains the video file and the PAG animation file.

It should be noted that the video file is the main content of the presentation, and the PAG animation file is used to add at least one special effect to the video file, and the PAG animation file is a file that is designed by an author through AE and then exported to an animation format. In some embodiments, the PAG animation file may also be exported by a designer after designing an animation through other graphics video processing software.

For example, the video file may be a highlight moment video clip of a user in a game, and the PAG animation file is used for adding a text effect or a map effect to the highlight moment video clip; or the video file can also be a highlight moment video clip of a user in a game of one week, and the PAG animation file is used for adding a character special effect or a chartlet special effect to the highlight moment video clip.

In some embodiments, the server may also retrieve an audio file that is used to add audio to the video file.

Step 502: and decoding the video file to obtain frame data of a plurality of video frames.

Step 502 can be realized by step 5021 to step 5023, and will be described with reference to each step.

Step 5021: and decapsulating the video file to obtain video stream data.

It should be noted that, when the video file does not include the audio file, the video file is decapsulated to obtain only video stream data; and when the video file comprises the audio file, decapsulating the video file to obtain audio stream data and video stream data.

In some embodiments, if the server additionally obtains the audio file, the audio file needs to be decapsulated to obtain audio stream data.

In practical implementation, the video file or the audio file can be unpacked through FFmpeg, and the av _ seek _ frame provided by FFmpeg can accurately acquire video data of a characteristic time or audio data of the characteristic time. Audio data in the video file or the audio file can be unpacked through FFmpeg to obtain audio data with the data type of AVpacket; and through FFmpeg, video data in the video file can be unpacked to obtain the video data with the data type of AVpacket.

Step 5022: and decoding the video data to obtain frame data of a plurality of video frames, wherein the format of the frame data is YUV format.

Here, the frame data in the YUV format is the raw data of the video that is not compression-encoded.

Step 5023: the data format of frame data of a plurality of video frames is converted from YUV format to RGBA format.

Here, since the data in the YUV format cannot be directly rendered, the data needs to be converted into the data in the RGBA format by the conversion module.

Step 503: and decoding the PAG animation file to obtain frame data of a plurality of animation frames.

Here, the PAG animation file is decoded, and the decoded data structure may be a vector, a bitmap sequence frame, or a video sequence frame. Taking the bitmap sequence frame as an example, decoding the PAG animation file to obtain the bitmap data of the key frame and a plurality of non-key frames of the animation.

Step 504: and performing animation rendering through PAG SDK according to the frame data of the plurality of video frames and the frame data of the corresponding animation frames respectively to obtain a plurality of target video frames.

Step 504 may be implemented by steps 5041 to 5042, which will be described in conjunction with the above steps.

Step 5041: and constructing a software simulation GPU environment on the CPU.

Here, since the animation rendering part of the implementation involves GPU rendering, and most of the server sides do not contain a GPU at present, a software simulation GPU environment needs to be built on a CPU. There are two common schemes currently used in the industry: the swift shader program introduced by Mesa and Google.

Table 1 is a comparison table of SwiftShader and Mesa performances, and referring to table 1, the average time consumption of SwiftShader scheme for rendering a single frame is significantly lower than Mesa, and it can be seen that SwiftShader simulated GPU performance is much better than Mesa.

TABLE 1

Step 5042: animation rendering is performed based on frame data of a plurality of video frames and frame data of an animation frame corresponding thereto through an interface of Skia.

Because the rendering of the Lottie scheme in the related art is realized based on the platform rendering API, for example, the Android platform is based on Canvas, and the iOS platform is based on CoreGraphics, the Lottie scheme does not support server-side rendering. The PAGDSK rendering scheme provided by the invention is based on Skia, which is a 2D vector graphics processing function library, and comprises high-efficiency and concise expressions of font, coordinate conversion and dot-matrix, and can realize platform-independent drawing.

In actual implementation, the server may perform graphics rendering from frame data of a plurality of video frames and frame data of corresponding animation frames through an interface provided by a SkSurface class provided by Skia.

In some embodiments, the following operations are performed on the frame data of each video frame and the frame data of the corresponding animation frame respectively: drawing a graph corresponding to the video frame in the blank canvas based on frame data of the video frame, and then drawing the graph corresponding to the animation frame at a preset position on the canvas on which the graph corresponding to the video frame is drawn based on frame data of the animation frame. In this manner, the image corresponding to the animation is overlaid on the graphics corresponding to the video frame.

In other embodiments, the following operations are performed on the frame data of each video frame and the frame data of the corresponding animation frame respectively: drawing a graph corresponding to the animation frame in the blank canvas based on the frame data of the animation frame; and then drawing the graph corresponding to the video frame at a preset position on the canvas on which the graph corresponding to the animation frame is drawn based on the frame data of the video frame. In this manner, the graphics corresponding to the video frame are overlaid on the graphics corresponding to the animation.

Step 505: and coding the plurality of target video frames to obtain a target video file.

Here, after rendering, screen capture processing is performed on the rendered content to obtain a plurality of target video frames (SkImage), then the data format of the target video frames is converted into a bitmap (SkBitmap), and the target video frames in the RGBA format are obtained according to the bitmap.

In practical implementation, the screen capture process can be implemented through the following interfaces: sk _ sp < SkImage > makeImageSnapshot ();

the data format of the target video frame may be converted to a bitmap (SkBitmap) by:

SkBitmap bitmap；

bitmap.allocN32Pixels(skImage->width(),skImage->height(),false)；

skImage->readPixels(bitmap.info(),bitmap.getPixels(),bitmap.rowBytes(),0,0)；

bitmap.setImmutable()；

in actual implementation, step 505 can be realized through step 5051 to step 5053, and the description will be made in conjunction with the steps.

Step 5051: and converting the target video frame from the RGBA format to the YUV format.

Here, the target video frame is converted from the RGBA format to the YUV format by a libswscale module provided by FFmpeg, and the converted data structure may be an AVFrame class of FFmpeg.

Step 5052: and coding a plurality of target video frames in YUV format to obtain video stream data.

Here, the FFmpeg-based libcodec module compresses a plurality of target video frames in YUV format into video stream data such as x264 and MPEG 4.

Step 5053: and encapsulating the video stream data to obtain a target video file.

Here, the video stream data is container-packaged into a video file in a video format such as MP4, MOV, AVI, or the like.

Step 506: and outputting the target video file.

Here, the target video file is output to the terminal so that the terminal plays the target video file.

The rendering process in the embodiment of the present invention is further explained below. Fig. 9 is a schematic flowchart of a rendering process according to an embodiment of the present invention, and referring to fig. 9, the rendering process includes:

step 601: the OpenGL environment was simulated by SwiftShader.

Here, two dynamic libraries, namely libegl.so and libglesv2.so, are compiled through the source code of SwiftShader, and an API interface of the dynamic libraries is similar to an OpenGL interface, so that an OpenGL environment for off-screen rendering can be constructed by referring to an OpenGL initialization mode.

Step 602: a GPU context for Skia is created.

Here, sk _ sp < const GrGLInterface > glInterface of skea is created according to the API interface provided by skea, and then a GPU context of skea is created: sk _ sp < GrContext > terContext ═ GrContext:: MakeGL (glinterface.

Step 603: the SkSurface is created from the interface provided by Skia.

Here, since related rendering of Skia is performed in SkSurface, it is necessary to create SkSurface after creating a GPU context for Skia. In actual implementation, the SkSurface may be created by:

static sk_sp<SkSurface>MakeFromBackendTexture(GrContext*context,

const GrBackendTexture&backendTexture,

GrSurfaceOrigin origin,int sampleCnt,

SkColorType colorType,

sk_sp<SkColorSpace>colorSpace,

const SkSurfaceProps*surfaceProps)；

step 604: the corresponding SkCanvas is obtained through SkSurface.

Step 605: and performing animation rendering according to the frame data of the current video frame and the frame data of the corresponding animation frame through Skcanvas.

Step 606: and performing screen capture processing through an API (application programming interface) provided by the SkSurface to obtain a target video frame.

Step 607: judging whether the rendering of all the video frames and the frame data of the corresponding animation frames is finished or not, if so, ending the process; otherwise; return to step 605.

Here, one frame of data is rendered at a time until rendering of all data is completed.

According to the invention, the workflow from AE animation design to server side rendering is opened, the special effect animation video can be rapidly and massively produced, and the problem of weak rendering capability of the animation of a low-end machine type can be effectively solved.

Continuing to describe an exemplary structure of the device 255 for adding a special effect to a video, implemented as a software module, provided in this embodiment of the present invention, fig. 10 is a schematic structural diagram of a component of the device for adding a special effect to a video, provided in this embodiment of the present invention, and referring to fig. 10, in some embodiments, the device for adding a special effect to a video, provided in this embodiment of the present invention includes:

an obtaining module 2551, configured to obtain a video file and an animation file for adding at least one special effect to the video file;

a decoding module 2552, configured to decode the video file to obtain frame data of multiple video frames of the video file, and decode the animation file to obtain frame data of multiple animation frames of the animation file, where the video frames and the animation frames have a one-to-one correspondence relationship;

a simulation module 2553 for simulating a graphics processor and running the simulated graphics processor;

a rendering module 2554, configured to perform animation rendering through the simulated graphics processor according to frame data of each video frame and frame data of a corresponding animation frame, respectively, so as to obtain multiple target video frames;

a synthesizing module 2555, configured to perform video synthesis based on the plurality of target video frames, so as to obtain a target video file to which the at least one special effect is added.

In some embodiments, the decoding module 2552 is further configured to decapsulate the video file to obtain video stream data in the video file;

In some embodiments, the decoding module 2552 is further configured to convert the data format of the frame data of the plurality of video frames from the YUV data format to the RGBA data format.

In some embodiments, the decoding module 2552 is further configured to decode the animation file to obtain frame data of a key frame and frame data of a plurality of non-key frames of the animation file;

the frame data of the key frame is bitmap data, and the frame data of the plurality of non-key frames is difference bitmap data relative to the key frame.

In some embodiments, the rendering module 2554 is further configured to simulate an environment of an open graphics library;

In some embodiments, the rendering module 2554 is further configured to perform the following operations on the frame data of each video frame and the frame data of the corresponding animation frame:

In some embodiments, the rendering module 2554 is further configured to perform the following operations on the frame data of each video frame and the frame data of the corresponding animation frame respectively:

In some embodiments, the apparatus further comprises:

In some embodiments, the composition module 2555 is further configured to encode and encapsulate the plurality of target video frames, resulting in a target video file to which the at least one special effect is added.

Embodiments of the present invention provide a storage medium storing executable instructions, where the executable instructions are stored, and when executed by a processor, will cause the processor to execute a method for adding special effects to a video provided by embodiments of the present invention, for example, the method shown in fig. 3.

In some embodiments, the storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method for adding special effects to video, the method comprising:

simulating a graphics processor and running the simulated graphics processor;

2. The method of claim 1, wherein said decoding the video file to obtain frame data for a plurality of video frames of the video file comprises:

decapsulating the video file to obtain video stream data in the video file;

3. The method of claim 2, wherein the format converting frame data of the plurality of video frames to frame data in a renderable data format comprises:

and converting the data format of the frame data of the plurality of video frames from a brightness-chrominance YUV data format into a red-green-blue-transparency RGBA data format.

4. The method of claim 1, wherein said decoding the animation file to obtain frame data for a plurality of animation frames of the animation file comprises:

decoding the animation file to obtain frame data of a key frame and frame data of a plurality of non-key frames of the animation file;

5. The method of claim 1, wherein performing animation rendering by the graphics processor via simulation according to the frame data of each video frame and the frame data of the corresponding animation frame to obtain a plurality of rendered target video frames comprises:

simulating an environment of an open graphic library;

6. The method of claim 1, wherein said performing animation rendering by said graphics processor via simulation based on frame data of each of said video frames and frame data of a corresponding animation frame, respectively, comprises:

performing the following operations on the frame data of each video frame and the frame data of the corresponding animation frame:

7. The method of claim 1, wherein said performing animation rendering by said graphics processor via simulation based on frame data of each of said video frames and frame data of a corresponding animation frame, respectively, comprises:

respectively executing the following operations on the frame data of each video frame and the frame data of the corresponding animation frame:

8. The method of claim 1, wherein the result of the animation rendering is a canvas carrying graphics corresponding to the target video frame, the method further comprising:

performing screen capture processing on a plurality of canvas obtained by rendering to obtain a plurality of target video frames;

9. The method of claim 1, wherein the video compositing based on the plurality of target video frames to obtain a target video file with the at least one special effect added comprises:

and carrying out video coding and packaging on the plurality of target video frames to obtain a target video file added with the at least one special effect.

10. An apparatus for adding special effects to video, the apparatus comprising: