CN110213504B

CN110213504B - Video processing method, information sending method and related equipment

Info

Publication number: CN110213504B
Application number: CN201810330586.6A
Authority: CN
Inventors: 湛家伟; 华有为; 李宏伟; 陈波; 张琦; 朱辉颖; 李诗
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-04-12
Filing date: 2018-04-12
Publication date: 2021-10-08
Anticipated expiration: 2038-04-12
Also published as: CN110213504A

Abstract

The embodiment of the invention discloses a video processing method, an information sending method and related equipment, which are used for carrying out real-time special effect processing on a video and improving the diversity of the video. The method provided by the embodiment of the invention comprises the following steps: acquiring input content, wherein the input content comprises text information; acquiring preset video frame data and animation frame data according to the text information; and mixing the animation frame data and the video frame data to obtain a first synthetic video.

Description

Video processing method, information sending method and related equipment

Technical Field

The present invention relates to the field of streaming media technologies, and in particular, to a video processing method, an information sending method, and a related device.

Background

With the rapid development of mobile terminals, video playing through mobile terminals is becoming more and more popular, and various video playing software is also widely used in people's daily life. In the process of playing video on the mobile terminal, the video needs to be processed by hardware such as an encoder and a decoder or coding and decoding software.

Software and hardware installed on different mobile terminals are not completely the same, and even if the software and hardware of the mobile terminal produced by the same manufacturer are customized to different degrees, the machine type fragmentation is serious, and various compatibility problems also occur to a system encoder. In the existing scheme, a mobile terminal uses a player of the system to play video.

In a specific environment, in order to increase the enjoyment and diversity of videos and meet the personalized requirements of users, the mobile terminal needs to synthesize the videos to obtain the videos required by the users. For a system player of a sub-band of a mobile terminal, the system player only has a playing function and does not have a synthesizing function, and cannot perform real-time special effect processing on a video.

Disclosure of Invention

The embodiment of the invention provides a video processing method, an information sending method and related equipment, which are used for carrying out real-time special effect processing on a video and improving the diversity of the video.

A first aspect of the present invention provides a video processing method, including:

acquiring input content, wherein the input content comprises text information;

acquiring preset video frame data and animation frame data according to the text information;

and mixing the animation frame data and the video frame data to obtain a first synthetic video.

A second aspect of the present invention provides an information sending method, including:

acquiring input content on a terminal and sending the input content to a server, wherein the input content comprises text information;

and receiving a composite video sent by the server and sharing the composite video on an application display interface, wherein the composite video is a first composite video or a second composite video generated according to the input content.

A third aspect of the present invention provides a terminal for information transmission, including:

the terminal comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring input content on the terminal, and the input content comprises text information;

a transmission unit for transmitting the input content to a server;

a receiving unit, configured to receive a composite video sent by the server, where the composite video is a first composite video or a second composite video generated according to the input content;

and the display unit is used for sharing the synthesized video on an application display interface.

A fourth aspect of the present invention provides a server comprising:

a first acquisition unit configured to acquire input content, the input content including text information;

the second acquisition unit is used for acquiring preset video frame data and animation frame data according to the text information;

and the synthesizing unit is used for mixing the animation frame data and the video frame data to obtain a first synthesized video.

A fifth aspect of the present invention provides a server, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory and comprises the following steps:

acquiring input content, wherein the input content comprises text information;

mixing the animation frame data and the video frame data to obtain a first synthetic video;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

A fourth aspect of the present invention provides a computer storage medium comprising instructions that, when executed on a computer, cause the computer to perform the operations of the aspects described above.

According to the technical scheme, the embodiment of the invention has the following advantages:

in an embodiment of the present invention, a video processing method is provided, including: acquiring input content, wherein the input content comprises text information; acquiring preset video frame data and animation frame data according to the text information; and mixing the animation frame data and the video frame data to obtain a first synthetic video. In the embodiment of the invention, the required special effect content is generated according to the input content, and the special effect content and the video are synthesized, so that the real-time special effect processing of the video is realized, and the diversity of the video is improved.

Drawings

FIG. 1 is a diagram of a network framework for implementing an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a video processing method according to an embodiment of the present invention;

FIG. 3A is a diagram illustrating an exemplary application scenario according to an embodiment of the present invention;

FIG. 3B is a schematic diagram illustrating another exemplary application scenario of the present invention;

FIG. 3C is a schematic diagram of another exemplary application scenario according to an embodiment of the present invention;

FIG. 4A is a diagram illustrating a scene of a change in location of a text message according to an embodiment of the invention;

FIG. 4B is a diagram illustrating a scene with background changes according to an embodiment of the present invention;

FIG. 4C is a schematic diagram of a scene of background region division according to an embodiment of the present invention;

FIG. 5A is a diagram illustrating successful publishing of a composite video according to an embodiment of the present invention;

FIG. 5B is a diagram illustrating an application interface in accordance with an embodiment of the present invention;

FIG. 6 is a diagram of one embodiment of a server in an embodiment of the invention;

FIG. 7 is a diagram of another embodiment of a server in an embodiment of the invention;

FIG. 8 is a diagram of another embodiment of a server in an embodiment of the invention;

fig. 9 is a schematic diagram of an embodiment of a terminal in the embodiment of the present invention.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It can be understood that the embodiment of the present invention is mainly applied to the field of video synthesis, as shown in fig. 1, a user equipment (i.e., a terminal) first obtains a video to be used from a server, then decodes the video through a system decoder provided in the user equipment, extracts data of each frame, then performs special effect processing on the video frame based on an open graphics library (open graphics library, open gl), and finally synthesizes the video, and then sends the synthesized video to the server through a network. For the video data of each frame, open GL not only can perform special effect processing (filtering, zooming, etc.) on the original frame, but also can render new animation effect to be superimposed, thereby enriching user experience. For example, the "magic utterance" item of the QQ space may be applied. The user inputs characters in a written and spoken mode, the user enters a 'magic sound speaking' page, and the background server determines corresponding audio and time axes according to the song selected by the user and the input characters. And the QQ space client randomly extracts local video clips according to the audio and the time axis, and adds corresponding character animations for synthesis, display and play. It should be understood that the embodiment of the present invention can be applied to the user equipment alone, the required songs are selected in the local database, and then the special effect synthesis of the local video is realized by using the hardware and software of the user equipment. The embodiment of the present invention can also be applied to a scene related to a User Generated Content (UGC), for example, a scene such as a video generated according to the UGC and an audio generated according to the UG C.

For convenience of explanation, a specific flow of an embodiment of the present invention is described below, and referring to fig. 2, when the video processing method provided by the present invention is applied to a "magic speech" item in a QQ space, an embodiment of the video processing method in the embodiment of the present invention includes:

201. the terminal acquires input content on the terminal, wherein the input content comprises text information.

The terminal acquires the input content of the user on the terminal, and specifically, the input content may include text information.

Specifically, a text input box is provided on the terminal, and input content input by a user is received, where the input content may include text information, where the text information may include one or more of numbers, symbols, and characters, for example, the text information "old driver takes me" acquired in the text input box may also be other text information, such as "today weather is good, it is suitable for going to play", "insist is winning", and the like, and the details are not limited herein.

It should be noted that the input content may also include other information, for example, audio information, such as a song, a recording, and other audio content, wherein the song may be all or a partial segment of the song.

For example, after the user inputs characters on the saying page, the user enters the saying page, and in the saying interface of the magic sound, the text information input by the user in the text input box is that the old driver takes me, as shown in fig. 3A. It can be understood that the user can also input audio content on the 'magic talking' interface, such as selecting songs online, after inputting the 'old driver takes me' information, selecting songs on the song selection bar, as shown in fig. 3B, selecting 'gecko walk' as background songs, and if the selected song content is not satisfied, switching songs. Specifically, as shown in fig. 3C, selecting a favorite song in the song selection list may select one of the songs "how do you want to me", "just meet you", "think like rain" for replacement, which is not limited herein. Similarly, if the inputted text information is not satisfactory, the interface for inputting the text information is returned, and the text information is re-inputted, for example, the "old driver takes me" is modified to "old driver takes me".

In a possible implementation, operations such as adding a tag, selecting a place, setting permissions, and the like may also be performed, where the adding of the tag is used for positioning the video, and for example, the tag may include fun, entertainment, and the like; the location can be selected by a positioning function; the set permission can classify the people watching the video, for example, the set permission can be set to be 'all people visible' so as to meet the personalized requirements of different users, and corresponding operation buttons are provided on the interface, as shown in fig. 3B and 3C.

202. The terminal transmits the input content to the server.

And the terminal sends the acquired input content to the server.

203. The server determines a time axis according to the text information.

The server determines a time axis according to the text information, specifically, the server generates corresponding time axis according to the character information and the character information, and the time axis is used for synthesizing the character information into character animation according to a certain sequence.

It should be noted that the time axes obtained from different text messages may be the same or different, and the time obtained from the same text message may also be different, specifically, for example, the time when each character appears in the text message may be set according to a preset time interval.

In one possible implementation, the text information is segmented, and the semantics and rhythm of each character information are determined, wherein the rhythm of each character information is embodied as the starting time and the duration on the time axis. For example, the text message "old driver takes my free flight" is segmented, the semantic meaning of each character is determined, and the parameters of the text message on the time axis are generated, and the time axis is obtained as follows:

"[ 595, 5080] old (595, 480) driver (1075, 485) airplane (1560, 600) band (2160, 345) band (2505, 5375) i (2880, 465) fly (5215, 460) from (3345, 345) to (3690, 495) (4665, 550) to fly (5975, 880)", wherein [595, 5080] indicates the time length of the text message on the time axis, "595" indicates the start time of the text message, "5080" indicates the end time of the text message, "595" in old (595, 480) indicates the start time of the word "old", and "480" indicates the duration of the word "old".

204. And the server acquires preset video frame data and animation frame data according to the text information and the time axis.

And the server acquires preset video frame data and animation frame data according to the text information. Animation frame data can be obtained according to the text information, and then preset video frame data is obtained according to the text information, wherein the animation frame data are increased special effect content. The preset video frame data may be obtained first, and then the animation frame data may be obtained, or the video frame data and the animation frame data may be obtained simultaneously, which is not limited herein.

Specifically, a system codec decodes a preset video to obtain video data of one frame; determining the occurrence time of each character in the text information through the text information and a time axis, synthesizing animation frame data, and setting and determining animation sequence parameters, wherein the animation sequence parameters may include one or more of parameters such as a starting time, a duration, a scaling, a position, a selection angle, a text bitmap, and the like, and are not limited herein. The characters can be set to elastically bounce up and down as required, and the positions of the characters in the animation frame data at different times are different, for example, the text message is that "we eat the hotpot and eat the hotpot condiment", the text message is located above the animation image at the first moment, moves to the position below the animation image at the second moment, moves to the position above the animation image at the third moment, and moves to the position below the animation image at the fourth moment, which is specifically shown in fig. 4A. For another example, a rotating background frame may be provided, as shown in fig. 4B at a certain time. For another example, the background may be divided into regions to obtain different animation effects, as shown in fig. 4C, different regions may be filled with different colors, and the specific details are not limited herein.

And finally, obtaining animation frame data according to the animation sequence parameters. It can be understood that there is no specific sequence between the video frame data acquisition and the animation frame data acquisition, and the video frame data can be acquired first and then the animation frame data can be acquired; or acquiring animation frame data first and then acquiring video frame data; the video frame data and the animation frame data may also be obtained simultaneously, and the details are not limited herein.

205. And the server mixes the animation frame data and the video frame data to obtain a synthetic video.

The server mixes the animation frame data and the video frame data to obtain a first composite video. The animation frame data and the video frame data are mixed to obtain mixed frame data, and then the mixed frame data are synthesized through a system codec or software to obtain a first synthesized video.

The openGL can be used to mix the video frame data and the animation frame data to obtain a mixed data frame. And drawing the video frame on the canvas by using openGL, and simultaneously drawing a text animation according to the text data. For example, in this embodiment, video frame data and animation frame data are synthesized by openGL, and after initializing the system codec, the video frame data and the animation frame data buffered in a Frame Buffer Object (FBO) area are mixed, and specifically, the first synthesized video may be obtained by Surface and EGL Surface in openGL.

It is understood that if the audio data is also included, the synthesized first composite video and the audio data need to be further synthesized to obtain a final second composite video.

When video composition is performed, first, encoding is performed using a system encoder to generate a moving picture frame video. And then, hardware synthesis is carried out by using a system codec according to a preset white list, and if the hardware synthesis fails, software synthesis is carried out through FFmpeg, so that the synthesis success rate is improved. For example, after the video composition is successful, the video is displayed on the interface of "magic saying", the publication button on the interface is triggered, and the first composite video is successfully published, as shown in fig. 5A.

206. And the server sends the composite video to the terminal.

The server sends the composite video to the terminal, wherein the composite video is a first composite video or a second composite video, the second composite video contains audio data, and the first composite video does not contain the audio data.

207. And the terminal receives the synthesized video sent by the server and shares the synthesized video on the application display interface.

The terminal receives a composite video sent by the server and shares the composite video on an application display interface, wherein the composite video is a first composite video or a second composite video generated according to input content. The second synthesized video comprises audio data, and the audio data is obtained according to the input audio information.

In the embodiment of the invention, the required special effect content is generated according to the input content, and the special effect content and the video are synthesized, so that the real-time special effect processing of the video is realized, and the diversity of the video is improved.

The following describes the usage process of "magic utterance" of a user in a QQ space scene. When a user opens a QQ space client on a terminal and enters a 'magic sound saying' interface, as shown in FIG. 3B, the user inputs a text content 'old driver takes me' in a content input box according to a prompt and selects a corresponding song 'gecko walk' as a synthesized material, the terminal sends the input text content and the selected song to a server, the server generates a corresponding synthesized video according to the text of 'old driver takes me' and the song 'gecko walk', and the synthesized video is displayed in a display interface of the QQ client, wherein the text content input by the user is displayed in a home page of the synthesized video, and the synthesized video is shown in FIG. 3B; the user can preview by clicking the composite video, and if the video effect is not satisfactory, the user can modify the characters, songs and the like of the input content, as shown in fig. 3C, until the composite video which is satisfactory to the user is obtained, click the publishing option on the 'magic sound saying' interface, and share the composite video on the personal homepage of the QQ space. As shown in fig. 5A, a composite video shared by the user is displayed on the homepage of the person, so that the composite video authored by the user is spread among friends.

It should be noted that, as shown in fig. 5B, before entering the "magic speech" interface, photos or videos required by the user are uploaded as materials of the composite video, so that the server generates specific video content according to the photos or videos uploaded by the user, and the personalized requirements of the user are met. For example, if the user needs to generate a birthday blessing video, the user may upload a birthday cake picture as the background of the composite video and upload a song "happy birthday blessing" as the background music of the video in the input content interface to obtain the composite video meeting the user's needs, or may upload other content, which is not limited herein.

It can be understood that the server can provide a large amount of materials for the user, can provide optional materials including sound, music, movies, cartoons, games and the like, and increases the diversity and the playability of the composite video synthesized by the user.

The embodiment of the invention can also be applied to other scenes in which the composite video is displayed in the application interface, for example, the scheme in the embodiment of the invention can be adopted in a magic sound saying applet of WeChat, and the creation and the transmission can be carried out in the WeChat scene, so that a content community is formed, and the user experience is improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the video processing method according to the embodiment of the present invention, the mixing the animation frame data and the video frame data to obtain a first composite video includes:

acquiring the model of a system coder-decoder, wherein the system coder-decoder is used for carrying out hardware synthesis on frame data;

judging whether the model of the system codec is in a preset white list range or not;

if the video frame data is in the preset white list range, performing hardware synthesis on the animation frame data and the video frame data through a system codec to obtain a synthesized video;

if the animation frame data and the video frame data are not in the preset white list range, software synthesis is carried out on the animation frame data and the video frame data through preset software to obtain a synthesized video.

In the embodiment of the invention, the video is subjected to special effect processing, the video subjected to the special effect processing is preferentially subjected to hardware synthesis, and after the hardware synthesis fails, the software synthesis is carried out, so that the influence of low compatibility of user equipment is reduced, and the success rate of video synthesis is improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the video processing method provided in the embodiment of the present invention, the method further includes:

if the system coder-decoder fails to synthesize the animation frame data and the video frame data, the preset software is used for carrying out software synthesis on the animation frame data and the video frame data to obtain a synthesized video.

In the embodiment, the process of continuing software synthesis after hardware synthesis failure is added, and the success rate of video synthesis is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the video processing method according to an embodiment of the present invention, the acquiring preset video frame data and preset animation frame data according to the text information includes:

determining a time axis according to the text information;

generating animation frame data according to the text information and the time axis;

the method comprises the steps that a preset video is arranged in a segmented mode according to a time axis to obtain a plurality of video clips;

randomly determining a target video clip in the plurality of video clips, wherein the duration of the target video clip corresponds to the duration of a time axis;

and decoding the target video film to obtain the preset video frame data.

In this embodiment, a process of acquiring video frame data is provided, and a preset video is segmented to obtain usable video frame data for subsequent video synthesis, so that implementation manners of the embodiment are increased.

It should be noted that there is no specific sequence between the step of generating animation frame data and the step of obtaining video frame data, and the steps may be executed sequentially or simultaneously.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the video processing method according to an embodiment of the present invention, the generating animation frame data according to the text information and the time axis includes:

determining each character contained in the text information;

determining a character image corresponding to each character from a preset image library;

and generating text animation frame data according to the character image and the time axis of each character.

In the embodiment, a process of acquiring text animation frame data is provided, the text information is determined into character images which need to be used, and then the character images are drawn into animation according to a time axis, so that usable animation frame data are obtained for subsequent video synthesis, and the implementation manner of the embodiment is increased.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the video processing method according to an embodiment of the present invention, the generating text animation frame data according to the character image and the time axis of each character includes:

determining animation sequence parameters according to the character image of each character, wherein the animation sequence parameters comprise the starting time and the duration of the character image;

and generating animation frame data according to the starting time and the duration.

It should be noted that the animation sequence parameters may also include parameters such as a scaling ratio, a display position, and the like, which are not limited herein.

In the embodiment, the animation sequence parameters are determined according to the character images, and then the text animation is generated by using the animation sequence parameters, so that the steps of the embodiment are more complete.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the video processing method provided in the embodiment of the present invention, the input content further includes audio information, and the method further includes:

acquiring audio data corresponding to audio information

Synchronizing the audio data with the first composite video, or

And mixing the audio data and the video frame data to obtain audio and video frame data, and generating a second synthetic video according to the audio and video frame data.

It should be noted that, the mixed frame video data may also be obtained first, then the audio data and the mixed frame data are further mixed to obtain audio/video frame data, and then the audio/video frame data and the audio/video frame data are synthesized to obtain a second synthesized video including the audio data.

In the embodiment, audio data is added, the implementation manner of the embodiment is increased, and the diversity of the synthesized video is improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the video processing method according to the embodiment of the present invention, the acquiring the audio data corresponding to the audio information includes:

identifying the audio information through a preset model to obtain melody and lyrics;

audio data is generated based on the melody and the lyrics.

It should be noted that the embodiment of the present invention can also be applied to a scene in which audio information is recognized, for example, in a magic utterance applet of WeChat, text information is input to generate corresponding speech content, and text input by a user is synthesized into speech content that approximates a human effect. The user can perform character creation based on songs or Background music (BGM) to generate videos quickly. For example, when a user inputs a recording which contains a song sung by the user, the terminal sends the obtained recording to the server, the server identifies the recording through a preset model, determines the contents of the melody, the lyric and the like in the recording, generates audio data which can be synthesized according to the contents of the melody, the lyric and the like, further processes the synthesized audio data, and returns the processed audio data to the terminal.

According to the embodiment of the invention, the preset training model is used for adapting the characters input by the user to the melody and the lyrics of the song, so that the characteristics of the song are considered while the audio data are obtained.

and sending the composite video to the terminal, wherein the composite video is the first composite video or the second composite video.

In the above description, a video processing method according to the present invention is described, and a server for executing the video processing method is described below.

Referring to fig. 6, an embodiment of a video processing apparatus according to the present invention includes:

a first acquisition unit 601 configured to acquire input content, the input content including text information;

a second obtaining unit 602, configured to obtain preset video frame data and animation frame data according to the text information;

a synthesizing unit 603, configured to mix the animation frame data and the video frame data to obtain a first synthesized video.

Referring to fig. 7, another embodiment of a video processing apparatus according to the present invention includes:

a first acquisition unit 701 configured to acquire input content, the input content including text information;

a second obtaining unit 702, configured to obtain preset video frame data and animation frame data according to the text information;

a synthesizing unit 703, configured to mix the animation frame data and the video frame data to obtain a first synthesized video.

Optionally, in a possible implementation, the synthesizing unit 703 includes:

a first obtaining module 7031, configured to obtain a model of a system codec, where the system codec is configured to perform hardware synthesis on frame data;

a judging module 7032, configured to judge whether the model of the system codec is within a preset white list range;

a first synthesis module 7033, configured to perform hardware synthesis on the animation frame data and the video frame data through the system codec to obtain the synthesized video if the animation frame data and the video frame data are within a preset white list range;

and the second synthesizing module 7034 is configured to perform software synthesis on the animation frame data and the video frame data through preset software to obtain the synthesized video if the animation frame data and the video frame data are not in the preset white list range.

Optionally, in a possible implementation, the synthesizing unit 703 further includes:

a third synthesis module 7035, configured to perform software synthesis on the animation frame data and the video frame data through the preset software to obtain the synthesized video if the system codec fails to synthesize the animation frame data and the video frame data.

Optionally, in a possible implementation manner, the second obtaining unit 702 includes:

a first determining module 7021, configured to determine a time axis according to the text information;

a generating module 7022, configured to generate animation frame data according to the text information and the time axis;

the arrangement module 7023 is configured to perform segmented arrangement on a preset video according to the time axis to obtain a plurality of video clips;

a second determining module 7024, configured to randomly determine a target video clip in the plurality of video clips, where a duration of the target video clip corresponds to a duration of the time axis;

a decoding module 7025, configured to decode the target video slice to obtain the preset video frame data.

Optionally, in a possible implementation, the generating module 7022 is specifically configured to:

determining each character contained in the text information;

and generating text animation frame data according to the character image of each character and the time axis.

Optionally, in a possible implementation, the input content further includes audio information, and the server further includes:

a third obtaining unit 704, configured to obtain audio data corresponding to the audio information;

a processing unit 705 for synchronizing the audio data with the first composite video, or

Optionally, in a possible implementation, the server further includes:

a sending unit 706, configured to send a composite video to a terminal, where the composite video is the first composite video or the second composite video.

The server in the embodiment of the present invention is described above from the perspective of the modular functional entity, and the server in the embodiment of the present invention is described below from the perspective of hardware processing. Referring to fig. 8, another embodiment of the server according to the embodiment of the present invention includes:

as shown in fig. 8, the server 800 includes: memory 801, processor 802, transceiver 803. Optionally, server 800 may also include a bus 804. The transceiver 803, the processor 802 and the memory 801 may be connected to each other via a bus 804; the bus 804 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 804 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

In the embodiment of the present invention, a terminal for sending information is further provided, as shown in fig. 9, for convenience of description, only a part related to the embodiment of the present invention is shown, and details of the specific technology are not disclosed, please refer to the method part in the embodiment of the present invention. The terminal can be a terminal device with a video playing function, such as a mobile phone, a tablet computer, and the like, taking the terminal as the mobile phone as an example:

fig. 9 is a block diagram showing a partial structure of a mobile phone related to a terminal provided by an embodiment of the present invention. Referring to fig. 9, the handset includes: radio Frequency (RF) circuitry 910, memory 920, input unit 930, display unit 940, sensor 950, audio circuitry 960, wireless fidelity (WiFi) module 970, processor 980, and power supply 990. Those skilled in the art will appreciate that the handset configuration shown in fig. 9 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 9:

the RF circuit 910 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for receiving downlink information of a base station and then processing the received downlink information to the processor 980; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuit 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 910 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.

The memory 920 may be used to store software programs and modules, and the processor 980 may execute various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 920. The memory 920 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as text information) created according to the use of the cellular phone, and the like. Further, the memory 920 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 930 may be used to receive input text information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 930 may include a touch panel 931 and other input devices 932. The touch panel 931, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 931 (e.g., a user's operation on or near the touch panel 931 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a preset program. Alternatively, the touch panel 931 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 980, and can receive and execute commands sent by the processor 980. In addition, the touch panel 931 may be implemented by various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 930 may include other input devices 932 in addition to the touch panel 931. In particular, other input devices 932 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 940 may be used to display information input by a user or a composite video provided to the user and various menus of the cellular phone. The display unit 940 may include a display panel 941, and optionally, the display panel 941 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 931 may cover the display panel 941, and when the touch panel 931 detects a touch operation on or near the touch panel 931, the touch panel transmits the touch operation to the processor 980 to determine the type of the touch event, and then the processor 980 provides a corresponding visual output on the display panel 941 according to the type of the touch event. Although in fig. 9, the touch panel 931 and the display panel 941 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 931 and the display panel 941 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 950, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 941 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 941 and/or backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 960, speaker 961, microphone 962 may provide an audio interface between a user and a cell phone. The audio circuit 960 may transmit the electrical signal converted from the received audio data to the speaker 961, and convert the electrical signal into a sound signal for output by the speaker 961; on the other hand, the microphone 962 converts the collected sound signal into an electrical signal, converts the electrical signal into audio data after being received by the audio circuit 960, and outputs the audio data to the processor 980 for processing, and then transmits the audio data to, for example, another mobile phone through the RF circuit 910, or outputs the audio data to the memory 920 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 970, and provides wireless broadband Internet access for the user. Although fig. 9 shows the WiFi module 970, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 980 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 920 and calling data stored in the memory 920, thereby integrally monitoring the mobile phone. Alternatively, processor 980 may include one or more processing units; preferably, the processor 980 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 980.

The handset also includes a power supply 990 (e.g., a battery) for supplying power to the various components, which may preferably be logically connected to the processor 980 via a power management system, thereby providing management of charging, discharging, and power consumption via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

Among others, RF circuit 910 is specifically configured to:

the input content is sent to the server.

RF circuit 910 is further specifically configured to:

and receiving a composite video sent by the server, wherein the composite video is a first composite video or a second composite video generated according to the input content.

The input unit 930 is specifically configured to:

input content on the terminal is acquired, the input content including text information.

The display unit 940 is specifically configured to:

and sharing the synthesized video on the application display interface.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The technical solutions provided by the present invention are introduced in detail, and the present invention applies specific examples to explain the principle and the implementation manner of the present invention, and the descriptions of the above examples are only used to help understanding the method and the core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A video processing method, comprising:

acquiring input content, wherein the input content comprises text information;

acquiring preset video frame data and animation frame data according to the text information, wherein the preset video frame data and animation frame data comprise: determining a time axis according to the text information, specifically comprising: segmenting text information, determining the semantics of each character information, and generating parameters of the text information on a time axis to obtain a time axis; the parameters on the time axis comprise a starting time and a duration;

the preset video is arranged in a segmented mode according to the time axis to obtain a plurality of video clips;

randomly determining a target video clip in the plurality of video clips, wherein the duration of the target video clip corresponds to the duration of the time axis;

decoding the target video segment to obtain the preset video frame data;

2. The method of claim 1, wherein the blending the animation frame data and the video frame data into a first composite video comprises:

obtaining the model of a system codec, wherein the system codec is used for performing hardware synthesis on frame data;

if the animation frame data and the video frame data are in the preset white list range, the system coder-decoder performs hardware synthesis on the animation frame data and the video frame data to obtain the first synthesized video;

and if the animation frame data and the video frame data are not in the preset white list range, performing software synthesis on the animation frame data and the video frame data through preset software to obtain the first synthesized video.

3. The method of claim 2, further comprising:

if the system coder-decoder fails to synthesize the animation frame data and the video frame data, software synthesis is carried out on the animation frame data and the video frame data through the preset software to obtain the first synthesized video.

4. The method of claim 1, wherein generating animation frame data according to the text information and the time axis comprises:

determining each character contained in the text information;

and generating animation frame data according to the character image of each character and the time axis.

5. The method according to claim 4, wherein the generating animation frame data from the character image of each character and the time axis comprises:

6. The method of any of claims 1-5, wherein the input content

Further comprising audio information, the method further comprising:

acquiring audio data corresponding to the audio information;

synchronizing the audio data with the first composite video, or

7. The method of claim 6, wherein obtaining the pair of audio information

The audio data includes:

and generating audio data according to the melody and the lyrics.

8. The method of claim 7, further comprising:

and sending the composite video to a terminal, wherein the composite video is the first composite video or the second composite video.

9. An information transmission method, comprising:

receiving a composite video sent by the server and sharing the composite video on an application display interface, wherein the composite video is a first composite video or a second composite video generated according to the input content;

the synthetic video is obtained by mixing animation frame data and preset video frame data based on the server; the animation frame data generation comprises: the server determines the semantics of each character message by segmenting the text message in the received input content, and generates the parameters of the text message on a time axis to obtain a time axis, wherein the parameters on the time axis comprise the starting time and the duration; the server generates the animation frame data according to the text information and the time axis; the video frame data are obtained by the server by arranging preset videos in a segmented mode according to the time axis determined by the text information to obtain a plurality of video clips, randomly determining target video clips in the video clips and decoding the target video clips, wherein the time length of the target video clips corresponds to the time length of the time axis.

10. The method of claim 9,

the input content also includes audio information.

11. The method of claim 10,

the second synthesized video comprises audio data, and the audio data is obtained according to the audio information.

12. A terminal for information transmission, comprising:

a transmission unit for transmitting the input content to a server;

a receiving unit, configured to receive a composite video sent by the server, where the composite video is a first composite video or a second composite video generated according to the input content; the synthetic video is obtained by mixing animation frame data and preset video frame data based on the server; the animation frame data generation comprises: the server determines the semantics of each character message by segmenting the text message in the received input content, and generates the parameters of the text message on a time axis to obtain a time axis, wherein the parameters on the time axis comprise the starting time and the duration; the server generates animation frame data according to the text information and the time axis; the video frame data are obtained by carrying out segmentation arrangement on a preset video according to the time axis determined by the text information by the server to obtain a plurality of video clips, randomly determining a target video clip in the plurality of video clips and decoding the target video clip, wherein the time length of the target video clip corresponds to the time length of the time axis;