CN108924622B

CN108924622B - Video processing method and device, storage medium and electronic device

Info

Publication number: CN108924622B
Application number: CN201810817813.8A
Authority: CN
Inventors: 司宇星
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2022-04-22
Anticipated expiration: 2038-07-24
Also published as: CN108924622A

Abstract

The embodiment of the invention discloses a video processing method and equipment, a storage medium and electronic equipment thereof, wherein the method comprises the following steps: acquiring a text adding signal input by aiming at a source video played by a current interface, and creating a text editing assembly in a first set area in a video playing area of the source video based on the text adding signal; acquiring text content input in the text editing component, and acquiring content parameters set for the text content; and synthesizing the text content and the source video based on the content parameters to obtain a target video corresponding to the source video. By adopting the method and the device, the process of adding the subtitles to the video can be directly completed on the current interface, the added subtitles can be displayed on the video in real time, and the real-time performance of interaction and the convenience of subtitle addition can be improved.

Description

Video processing method and device, storage medium and electronic device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a video processing method and device, a storage medium, and an electronic device.

Background

With the rapid development of the mobile internet, various applications on the user terminal are increasing, wherein the video application is basically necessary for each user terminal, and the user can view colorful video files by using the video application. The user sometimes needs to edit the video while watching, such as beautifying the video file, adding a filter, and sometimes needs to add text content (subtitle) to the video, so that the content which the user wants to express can be displayed in the video in the form of subtitle, and the video creation method is very entertaining and interactive, and achieves the purpose of secondary creation.

Most video editing APPs in the market today add subtitles in a video in a processing process that a user clicks a subtitle adding control on a current interface, then pulls up a subtitle editing page, and the user inputs text content in the subtitle editing page and jumps back to the previous page after confirmation to display subtitles. The interactive operation comprises three steps of pulling up the editing interface, editing the subtitle content, returning to the video interface and displaying the subtitle on the video, and the steps are complex, take longer time and cannot display the added subtitle on the video in real time.

Disclosure of Invention

Embodiments of the present invention provide a video processing method and device, a storage medium, and an electronic device, which can directly complete a process of adding subtitles to a video on a current interface, can display the added subtitles on the video in real time, and can improve the real-time performance of interaction and the convenience of adding subtitles.

An embodiment of the present invention provides a video processing method, which may include:

acquiring a text adding signal input by aiming at a source video played by a current interface, and creating a text editing assembly in a first set area in a video playing area of the source video based on the text adding signal;

acquiring text content input in the text editing component, and acquiring content parameters set for the text content;

and synthesizing the text content and the source video based on the content parameters to obtain a target video corresponding to the source video.

Optionally, the obtaining the text content input in the text editing component and the content parameter set for the text content includes:

acquiring a text editing component from the text editing component set displayed in the first set area;

acquiring the text content input in the text editing component, and acquiring the content parameters set for the text content.

Optionally, the obtaining content parameters set for the text content includes:

acquiring a text animation format selected in a text animation format set aiming at the text content, wherein the text animation format set is positioned in a second set area except the video playing area in the current interface, and/or;

acquiring a text style selected in a text style set aiming at the text content, wherein the text style set is positioned in a third set area except the video playing area in the current interface and/or;

and acquiring time adjustment information set aiming at text playing time information corresponding to the text content, wherein the text playing time information is positioned in a fourth set area except the video playing area in the current interface.

Optionally, the obtaining time adjustment information set for the text playing time information corresponding to the text content includes:

when time adjustment information set for the text starting time is acquired, adjusting the text starting time based on the time adjustment information, and replacing a current display image of the video playing area with an image indicated by the adjusted text starting time;

and when the time adjustment information set for the text end time is acquired, adjusting the text end time based on the time adjustment information, and replacing the currently displayed image of the video playing area with the image indicated by the adjusted text end time.

Optionally, the synthesizing the text content and the source video based on the content parameter to obtain a target video corresponding to the source video includes:

synthesizing the content parameters and the text content to obtain special effect text content corresponding to the text content;

acquiring text alignment information of the text content under the current video resolution of the source video;

and synthesizing the special effect text content and the source video based on the text alignment information to obtain a target video corresponding to the source video.

Optionally, the obtaining text alignment information of the text content at the current video resolution of the source video includes:

acquiring a first center point coordinate and a first font size of the text content under a reference video resolution;

and obtaining a scaling ratio of the current video resolution and the reference video resolution, and respectively adjusting the first center point coordinate and the first font size based on the scaling ratio to obtain a second center point coordinate and a second font size of the text content under the current video resolution, wherein the text alignment information comprises the second center point coordinate and the second font size.

Optionally, the method further includes:

acquiring a text editing signal input aiming at the text content, and editing the text content based on the text editing signal, wherein the text editing signal comprises at least one of the following: a text modification signal, a text deletion signal, a text query signal, and a text scaling signal.

An aspect of an embodiment of the present invention provides a video processing apparatus, which may include:

the device comprises a component creating unit, a text editing component and a video playing unit, wherein the component creating unit is used for acquiring a text adding signal input by a source video played by a current interface and creating a text editing component in a first set area in a video playing area of the source video based on the text adding signal;

a parameter acquisition unit configured to acquire text content input in the text editing component, and acquire a content parameter set for the text content;

and the video generation unit is used for synthesizing the text content and the source video based on the content parameters to obtain a target video corresponding to the source video.

Optionally, the parameter obtaining unit includes:

the component acquiring subunit is used for acquiring a text editing component from the text editing component set displayed in the first setting area;

and the parameter acquisition subunit is used for acquiring the text content input in the text editing component and acquiring the content parameters set for the text content.

Optionally, the parameter obtaining unit is specifically configured to:

Optionally, the video generating unit includes:

the special effect text generation subunit is configured to perform synthesis processing on the content parameter and the text content to obtain special effect text content corresponding to the text content;

an alignment information obtaining subunit, configured to obtain text alignment information of the text content at a current video resolution of the source video;

and the target video generation subunit is configured to perform synthesis processing on the special effect text content and the source video based on the text alignment information to obtain a target video corresponding to the source video.

Optionally, the alignment information obtaining subunit is specifically configured to:

Optionally, the method further includes:

a text editing unit, configured to acquire a text editing signal input for the text content, and edit the text content based on the text editing signal, where the text editing signal includes at least one of: a text modification signal, a text deletion signal, a text query signal, and a text scaling signal.

An aspect of the embodiments of the present invention provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.

An aspect of an embodiment of the present invention provides an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

In the embodiment of the invention, text content (subtitles) is added into a video by acquiring a text addition signal input by a source video played by a current interface, creating a text editing component in a first set area in a video playing area of the source video based on the text addition signal, acquiring text content input in the text editing component and content parameters of the text content, and synthesizing the text content and the source video based on the content parameters. The process of adding the subtitles to the video can be directly completed on the current interface, the added subtitles can be displayed on the video in real time, and the interaction real-time performance and the convenience of subtitle addition can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a video processing method according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present invention;

FIG. 4 is an interface diagram of a current interface provided by an embodiment of the invention;

FIG. 5 is a schematic interface diagram of a text content input method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a display effect of a text animation format according to an embodiment of the present invention;

fig. 7 is a flowchart illustrating a video processing method according to an embodiment of the present invention;

fig. 8 is a schematic diagram illustrating a display effect of text content at a reference video resolution according to an embodiment of the present invention;

fig. 9 is a schematic diagram illustrating a display effect of text content at a current video resolution according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a video processing device according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a video processing device according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a parameter obtaining unit according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of a video generating unit according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present invention. As shown in fig. 1, the network architecture may include a server 2000 and a user terminal cluster; the user terminal cluster may include a plurality of user terminals, as shown in fig. 1, specifically including a user terminal 3000a, user terminals 3000b, …, and a user terminal 3000 n; for convenience of understanding, in the embodiment of the present invention, one of the user terminals shown in fig. 1 may be selected as an execution subject of the present solution, or a server may be selected as an execution subject of the present solution.

The user terminal may include a tablet computer, a Personal Computer (PC), a smart phone, a palm computer, a Mobile Internet Device (MID), and other terminal devices having a video processing function, and may further include an application program (such as an applet or a client APP) having a video processing function.

The server may be an application server with video processing functionality.

When the user terminal is taken as an execution subject, for convenience of understanding, the embodiment of the present invention may be described by taking the user terminal 3000a in fig. 1 as an example.

As shown in fig. 1, the user terminal 3000a is configured to obtain a text addition signal input for a source video played by a current interface, and create a text editing component in a first setting area in a video playing area of the source video based on the text addition signal;

the current interface may include a plurality of areas, and one of the areas may be used as a video playing area for playing a video. The source video refers to an initial video to which no text content is added, and can be understood as a video to which a part of text content is added and new text content needs to be added.

The text addition signal is triggered by the user, and received by the user terminal 3000a is a text addition signal. The user may trigger by touching a virtual control of the current interface, or by touching a physical key on the user terminal 3000a, or by performing voice input through a microphone on the user terminal 3000a, or by using a camera on the user terminal 3000a to perform gesture actions input by the user.

Of course, the video playing area further includes an area to which a text editing component is added, and the area is used as a first setting area and can be any position in the video playing area, and a plurality of text editing components can be simultaneously displayed in the area.

The text editing component is a text box component for editing text content, and a user can input text content in any form, such as characters, pictures, emoticons and the like, in the added text box component.

Further, the user terminal 3000a is further configured to obtain text content input in the text editing component, and obtain content parameters set for the text content;

the content parameters may include text animation formats, such as voice-over, bubble, bullet screen, title, judder, and ticker formats; text style such as font form, font size, font color, stroking width, etc. may also be included, and text display time information corresponding to text content such as text start time, text end time, text duration may also be included.

Of course, if a plurality of text components are displayed in the first setting area, the user may select one as a component currently used for inputting text contents.

The user terminal 3000a is further configured to perform synthesis processing on the text content and the source video based on the content parameter, so as to obtain a target video corresponding to the source video.

And the synthesizing process is to add the text content to the source video to obtain a target video, namely the video added with the text content.

It is also understood that after the target video is obtained, the user may edit the added text content, such as modifying the text content, deleting the text content, scaling the font size of the text content, and so on.

When a server is taken as an execution subject, for convenience of understanding, the embodiment of the present invention may select one user terminal as a target user terminal (e.g., 3000a) from a plurality of user terminals shown in fig. 1.

The server 2000 is configured to acquire a text addition signal input for a source video played on a current interface, and create a text editing component in a first setting area in a video playing area of the source video based on the text addition signal;

the server 2000 is further configured to obtain text content input in the text editing component, and obtain content parameters set for the text content;

the server 2000 is further configured to perform synthesis processing on the text content and the source video based on the content parameter, so as to obtain a target video corresponding to the source video.

Of course, the user terminal 3000a may also be configured to acquire a text addition signal input for a source video played on a current interface, and create a text editing component in a first setting area in a video playing area of the source video based on the text addition signal;

the user terminal 3000a is further configured to obtain text content input in the text editing component, and obtain content parameters set for the text content;

the user terminal 3000a is further configured to send the content parameter to the server 2000;

the server 2000 is configured to synthesize the text content and the source video based on the content parameter to obtain a target video corresponding to the source video;

the server 2000 is further configured to transmit the target video to the user terminal 3000 a.

The user terminal 3000a, the user terminals 3000b and …, and the user terminal 3000n may be respectively connected to the server 2000 via a network, so as to upload the content parameters to the server, and receive the target video delivered by the server 2000.

The following describes the video processing method according to the embodiment of the present invention in detail with reference to fig. 2 to 9. The video processing method in the embodiment of the present invention is executed by a video processing device, which may be any one of the user terminals 3000a, 3000b, …, and 3000n shown in fig. 1, or may be the server 2000 shown in fig. 1.

Referring to fig. 3, a flow chart of a video processing method according to an embodiment of the invention is shown. As shown in fig. 1, the method of the embodiment of the present invention may include the following steps S101 to S103.

S101, video processing equipment acquires a text adding signal input by aiming at a source video played by a current interface, and creates a text editing assembly in a first set area in a video playing area of the source video based on the text adding signal;

it is understood that the source video refers to a video file that is playing or is about to play at the current interface. The format of the source video can be AVI format, QuickTime format, RealVideo format, NAVI format, DivX format, MPEG format, or the like.

The source video file can be acquired through a video input unit of the video processing device after an operation signal for acquiring the video file is input on the video processing device by a user, for example, the source video file is selected from a local video library (such as an album), or is acquired currently through camera shooting, or is acquired currently through network downloading, and the like.

The source video is displayed in a video playing area of the current interface, and the video playing area may be a whole area of the current interface, such as a full-screen display area, or may be a partial area of the current interface.

Specifically, when a user performs a text addition operation on a source video displayed on a current interface, for example, by clicking a "add subtitle" virtual control on the interface, inputting a voice "add subtitle" through a voice receiver, inputting "add subtitle" using a virtual keyboard, and the like, and when the video processing device detects a text addition signal, a text editing component is created on the current interface, preferably, a text editing component is created in a video playing area. The text editing component, i.e. the text box component, refers to a movable and resizable object or graphic container. The textbox component has a number of attributes, e.g., Text: acquiring or setting a current text in a text box under the condition of single-line editing; multiline: indicating whether it is a multi-line textbox control; lines: acquiring or setting text lines in a text box control under the condition of multi-line editing, and acquiring or setting text lines in the text box control; WordWrap: in the multi-line text box, if the width of a line exceeds the width of the control, whether the text of the line is automatically wrapped or not can be controlled, and the like, and the display format of the text content displayed in the text box component can be controlled by setting the attribute of the text box component. In Android, the textbox component is represented using TextView for displaying text on the screen. The text box in the Android may display a single line of text (Android: single line ═ true ") or a plurality of lines of text (Android: single line ═ false"), and text with a picture (Android: drawable top @ drawable/ic _ launcher), and the like. Of course, the text editing component may be set, such as setting the border shape, color, line width, background color, and the like of the text box.

It should be noted that the first setting area is any area in the video playing area, and a plurality of text editing components may be included in the first setting area, that is, a plurality of texts may be simultaneously displayed in the video playing area. When the video processing device detects a text addition signal, the video processing device does not detect whether a text editing component already exists at the default position of the default first setting area and directly creates the text editing component again at the position; it can also be understood that the display positions of the components in the first setting area are arranged in sequence, when the video processing device detects a text adding signal, sequentially traversing whether a text editing component already exists in each position in the first setting area, if so, arranging the text editing component which needs to be created at present at the next position; of course, it is also possible to randomly create a text editing component directly in the video playback area when the video processing device detects a text addition signal.

Optionally, before the user inputs the text adding operation, the playing source video may be paused, so that the text editing may be performed on the current frame image. The pause can be performed by touching the play key, or the source video can be paused synchronously when the user inputs text adding operation, or the pause can be controlled by inputting voice signals.

S102, the video processing equipment acquires the text content input in the text editing assembly and acquires the content parameters set for the text content;

it is understood that the text content may include any form of data or combination of words, emoticons, symbols, pictures, etc., and the words may be words of different languages, such as chinese, english, french, etc. The user can input text content through a keyboard (physical keyboard keys or virtual keyboard controls) or a handwriting area of the video processing device, and can also input voice signals through a voice receiver of the video processing device and analyze the voice signals to generate the text content.

The content parameters of the text content may include text animation formats, such as voice-over, bubble, bullet, title, tremble, and ticker formats, and may be added through a caption base class (e.g., QHOStickerStyleConfig) if a new text animation format needs to be extended subsequently. The content parameters of the text content may further include text styles, such as font style, font size, font color, stroke width, and the like, and it may be understood that any one text animation format may correspond to all kinds of text styles, and it may also be understood that each text animation format corresponds to only a few set text styles. The content parameter of the text content may further include text display time information corresponding to the text content, such as a text start time, a text end time, and a text duration. One of the content parameters may be set, or two or more of them may be set simultaneously. Of course, if the user does not set parameters for the text content currently, the text parameters refer to default (preset) text parameters. The default text parameter may be preset by the user or may be self-contained by the system.

Optionally, when the text display time is adjusted, the source video may also be adjusted synchronously with the adjustment time. When the time is adjusted according to the text starting time, the video processing equipment replaces the current display image of the video playing area by the image indicated by the adjusted text starting time; and when the time of the text end time is adjusted, the video processing equipment replaces the currently displayed image of the video playing area with the image indicated by the adjusted text end time.

Specifically, if the video playing area includes a plurality of text editing components, one of the text editing components is selected as a component for inputting text content by the user, and if the video playing area includes only one text editing component, after the user inputs text content in the text editing component, the user can click a completion button on the current interface to complete the editing process of the text content, and a subtitle container view (e.g., QHOStickerContainedView) can carry subtitle content, so as to implement the functions of zooming, moving and deleting the entire text content. Meanwhile, the text content is displayed in real time through rendering. When the input text content is displayed in the first setting area in real time, the user can set the content parameters of the input text content, and the setting function of the content parameters is realized by a layer animation tool (such as QHOLayerAnimationUtil). If the user selects one of the text animation format sets displayed in the second setting area except the video playing area in the current interface, the video playing equipment acquires the text animation format selected in the text animation format set aiming at the text content so as to complete the setting of the text animation format. Of course, the switching of the text animation format can also be realized through the switching operation input by the user on the text animation format set. If the user selects one of the text style sets displayed in a third setting area except the video playing area in the current interface, the video playing device acquires the text style selected in the text style set aiming at the text content to complete the setting of the text style. And if the text playing time information displayed in a fourth setting area except the video playing area in the current interface is adjusted by the user, the video playing equipment acquires time adjustment information set aiming at the text playing time information corresponding to the text content.

And rendering the display process of the text content through a user interface UI control. For example, for an iOS system, a UILabel implementation may be employed. When a user inputs new text content, the video playing device traverses all UILabel in the current subtitle style, assigns the currently input text content to the UILabel and renders the text content in real time. The UILabel inherits from UIView, which is a very frequently used view control in iOS that is commonly used for displaying text.

For example, the creation process may be:

UILabel*label＝[[UILabel alloc]initWithFrame:CGRectMake(20,64,100,30)]；

[self.view addSubview:label]；

and then, realizing attribute setting by using a direct "-" attribute English name or a "set" attribute English name, such as:

background color ═ UIColor yellowsolor ]; v/setting background color

Textcolor ═ UIColor red color ]; // set the color of the text on the Label

Text @ "i is a UILabel"; // set the text on Label

label. font ═ UIFont system fontofsize: 15; setting default size of characters on Label to be 15

Textalignment center; // set the text bits to default to left

label.numberoflines ═ 0; // setting the number of lines defaults to 1, and when 0, it may be that a plurality of lines are set

Font ═ UIFont fontputhname: @ "Arial" size:30 ]; // set content font and font size

label, highlighted is YES; // Label is highlighted.

Optionally, after the input text content is obtained, the text content displayed in the target video may be subjected to text modification, text deletion, text query, or text scaling by a subtitle data manager (e.g., QHOStickerItemManager).

S103, the video processing equipment synthesizes the text content and the source video based on the content parameters to obtain a target video corresponding to the source video.

It is understood that the process of synthesizing the text content and the source video refers to a process of aligning the derived text content with the source video (center point alignment and font size alignment) and then adding the text content to the source video. If the text parameter of the text content is not set, the text content refers to the text content displayed by adopting the default text parameter, and if the text parameter of the text content is set, the text content refers to the special effect text content displayed by adopting the set content parameter.

In which a caption derivation tool (e.g., QHOExportParam) may be employed to derive the text content to ensure that the size of the word size and the position of the center point derived at different resolutions for each caption in the text content remain consistent. The caption exporting tool can calculate the central point position and the font size of the text content in the video of the current equipment through the font size and the central point position of the text content in the reference video and the font size and the central point position of the current equipment.

Specifically, the video processing device synthesizes the content parameter with the text content to obtain and derive special effect text content corresponding to the text content, then obtains a first center point coordinate and a first font size of the text content at a reference video resolution, obtains a scaling ratio between the current video resolution and the reference video resolution, performs scaling on the first center point coordinate and the first font size based on the scaling ratio to obtain a second center point coordinate and a second font size of the text content at the current video resolution, adds the special effect text content to the source video based on the second center point coordinate and the second font size (for example, synthesizes the special effect text content with the source video by using a synthesis tool avdeocomposioncoreanimationol), thereby obtaining a target video to which the text content (i.e., a subtitle) is added, the target video is then rendered for display at the current video resolution.

Optionally, after the target video is generated, the playing of the target video may be controlled, and meanwhile, the text content displayed in the target video may be edited by a subtitle data manager (e.g., QHOStickerItemManager), such as modifying the text content, deleting the text content, adding the text content, and the like.

Of course, after the text content input in the text editing component is acquired, the text content displayed in the target video may be edited by a subtitle data manager (e.g., QHOStickerItemManager), such as modifying the text content, deleting the text content, adding the text content, and the like.

Referring to fig. 3, a flow chart of a video processing method according to an embodiment of the invention is shown. As shown in fig. 3, the method of the embodiment of the present invention may include the following steps S201 to S210.

S201, a video processing device acquires a text adding signal input by aiming at a source video played by a current interface, and creates a text editing component in a first set area in a video playing area of the source video based on the text adding signal;

The source video is displayed in a video playing area of the current interface, and the video playing area may be a whole area of the current interface, such as a full-screen display area, or may be a partial area of the current interface. For example, as shown in fig. 4, which is an interface schematic diagram of the current interface, a plurality of regions may be included in the current interface, where 1 is a video playing region, 2 is a video editing option display region, 3 is a time display region of a text, 4 is a text animation format display region, and the like. Wherein, the source video is displayed in the video playing area 1 of the current interface.

Specifically, when the user performs a text addition operation on the source video displayed on the current interface, for example, by clicking the "add subtitle" virtual control in 2, inputting voice "add subtitle" through a voice receiver, inputting "add subtitle" using a virtual keyboard, and the like, the video processing apparatus detects a text addition signal, and creates a text editing component on the current interface, preferably creates a text editing component in the video playing area 1. The text editing component, i.e. the text box component, refers to a movable and resizable object or graphic container. The textbox component has a number of attributes, e.g., Text: acquiring or setting a current text in a text box under the condition of single-line editing; multiline: indicating whether it is a multi-line textbox control; lines: acquiring or setting text lines in a text box control under the condition of multi-line editing, and acquiring or setting text lines in the text box control; WordWrap: in the multi-line text box, if the width of a line exceeds the width of the control, whether the text of the line is automatically wrapped or not can be controlled, and the like, and the display format of the text content displayed in the text box component can be controlled by setting the attribute of the text box component. In Android, the textbox component is represented using TextView for displaying text on the screen. The text box in the Android may display a single line of text (Android: single line ═ true ") or a plurality of lines of text (Android: single line ═ false"), and text with a picture (Android: drawable top @ drawable/ic _ launcher), and the like. Of course, the text editing component may be set, such as setting the border shape, color, line width, background color, and the like of the text box.

It should be noted that the first setting area is any area in the video playing area, and a plurality of text editing components may be included in the first setting area, that is, a plurality of texts may be simultaneously displayed in the video playing area, such as "AAA", "BBB", and "CCC" are simultaneously displayed in the video playing area 1 in fig. 4. When the video processing device detects a text addition signal, the video processing device does not detect whether a text editing component already exists at a default position of a default first setting area, and directly creates the text editing component again at the default position, at this time, the newly created text editing component may cover the originally existing text editing component, and the newly created text editing component or the existing text editing component is moved to another position of the first setting area; it can also be understood that the display positions of the components in the first setting area are arranged in sequence, such as position No. 1, position No. 2, position No. 3 …, when the video processing device detects a text addition signal, sequentially traversing whether a text editing component already exists in each position in the first setting area in sequence, for example, detecting whether a text editing component exists in position No. 1, if so, traversing position No. 2, then detecting again, and if no text editing component exists in position No. 2, arranging the text editing component that needs to be created currently in position No. 2; of course, when the video processing device detects a text addition signal, a text editing component can be directly and randomly created in the video playing area, and if the text editing component is covered, the text editing component can be manually moved.

Optionally, before the user inputs the text adding operation, the playing source video may be paused, so that the text editing may be performed on the current frame image. The source video can be paused by touching a play key (e.g., a play control on the left side of the area 2 in fig. 4), or the source video can be paused synchronously when a user inputs a text adding operation (e.g., clicks an "add subtitle" control in the area 2 in fig. 4).

S202, the video processing equipment acquires a text editing component from the text editing component set displayed in the first set area;

it is to be understood that at least one text editing component is included in the set of text editing components. When a plurality of text editing components are displayed in the first setting area, as shown in fig. 4, the components carrying "AAA", the components carrying "BBB", and the components carrying "CCC" are selected by the user as the components for text content input.

Alternatively, when the selection signal input by the user for the text editing component set is not detected within the preset time length, the text editing component created most recently at the current time may be used as the component for inputting the text content.

S203, the video processing equipment acquires the text content input in the text editing component;

it is understood that the text content may include any form of data or combination of words, emoticons (e.g., emoji), symbols, pictures, etc., and the words may be words of different languages, such as chinese, english, french, etc. The user can input text content through a keyboard (physical keyboard keys or virtual keyboard controls) or a handwriting area of the video processing device, and can also input voice signals through a voice receiver of the video processing device and analyze the voice signals to generate the text content.

As shown in fig. 5, a possible text content input method is that after the text editing component is created in the first setting area, a virtual keyboard is popped up synchronously, text content is input through the virtual keyboard, and after the input is finished, a confirmation gesture is clicked to complete the input of text.

Specifically, after the user inputs the text content in the text editing component, the user may click a completion button (e.g., a completion button in the upper right corner of the area 2 in fig. 4) on the current interface to complete the editing process of the text content, and the subtitle container view (e.g., qhostickercontenediview) may carry the text content, so that the functions of zooming, moving, and deleting the entire text content may be implemented. And the video processing equipment acquires the borne text content and displays the text content in a first set area in real time.

And rendering the display process of the text content through a user interface UI control. For example, for an iOS system, a UILabel implementation may be employed. When a user inputs new text content, all the UILabel in the current subtitle style can be traversed, and the currently input text content is assigned to the UILabel and rendered in real time. The UILabel inherits from UIView, which is a very frequently used view control in iOS that is commonly used for displaying text.

For example, the creation process may be:

UILabel*label＝[[UILabel alloc]initWithFrame:CGRectMake(20,64,100,30)]；

[self.view addSubview:label]；

background color ═ UIColor yellowsolor ]; v/setting background color

Textcolor ═ UIColor red color ]; // set the color of the text on the Label

Text @ "i is a UILabel"; // set the text on Label

label. font ═ UIFont system fontofsize: 15; i/set the size of the characters on the Label to default to 17

Textalignment center; // set the text bits to default to left

label, highlighted is YES; // Label is highlighted.

S204, the video processing equipment acquires a text animation format selected in a text animation format set aiming at the text content, wherein the text animation format set is positioned in a second set area except the video playing area in the current interface;

it is understood that the text animation format set is located in a second setting area, such as area 4 in fig. 4, of the current interface, except for the video playing area. At least one text animation format is included in the text animation format set, and the formats can be such as voice-over, bubble, bullet screen, title, tremble, horse race lamp and the like. Of course, when the number of included text animation formats is large, the included text animation formats may be displayed by scrolling.

If a new text animation format needs to be extended subsequently, the new text animation format can be added to the text animation format set through a subtitle base class (such as QHOStickStyleContfig).

Specifically, the user selects one of the text animation format sets displayed in a second setting area (e.g., area 4 in fig. 4) other than the video playing area in the current interface, and the video playing device acquires the text animation format selected in the text animation format set for the text content to complete the setting of the text animation format. Of course, the switching of the text animation format can also be realized through the switching operation input by the user on the text animation format set.

If the currently selected text animation formats for the three text contents are respectively "common subtitle", "bubble animation", and "bullet animation", the three text animation formats are respectively displayed in the first setting area, as shown in fig. 6.

Optionally, if the user does not currently select the text animation format, the text animation format may be considered to be a default (preset) text animation format. The default text animation format may be preset by a user or may be self-contained by the system.

S205, the video processing equipment acquires a text style selected in a text style set aiming at the text content, wherein the text style set is positioned in a third set area except the video playing area in the current interface;

it is understood that the text style set is located in a third setting area of the current interface other than the video playing area, such as area 4 in fig. 4. At least one text style is included in the set of text styles, the text style including font form, font size, font color, stroke width, and the like. It is to be understood that any one of the text animation formats may correspond to all kinds of text styles, and it is to be understood that each of the text animation formats corresponds to only a few set text styles.

For example, there are 10 text animation formats and 100 text styles, and each of the 10 text animation formats may correspond to 100 text styles, that is, each text animation format may be provided with any form of text style. The 1 st text animation format may correspond to 1 to 10 text styles, the 2 nd text animation format may correspond to 11 to 20 text styles, …, and the 10 th text animation format may correspond to 10 to 100 text styles, that is, each text animation format is only provided with a specific type of text style, and of course, the number of the text styles corresponding to each animation format may be different.

Meanwhile, the third setting area and the second setting area may be the same area or different areas. That is, the text style and the text animation format may be set in the same area, or may be set in different areas. Of course, for the settings of the text style and the text animation format, only one of the text style and the text animation format may be selected for setting, or both of the text style and the text animation format may be set, or neither of the text style and the text animation format may be set.

Specifically, the user selects one of the text style sets displayed in a third setting area other than the video playing area in the current interface, and the video playing device acquires the text style selected in the text style set for the text content to complete the setting of the text style.

S206, the video processing equipment acquires time adjustment information set for text playing time information corresponding to the text content, wherein the text playing time information is located in a fourth set area except the video playing area in the current interface;

it is understood that the text play time information includes a text start time, a text end time, and a text duration. As in area 3 in fig. 4, the text playing time information is displayed in the form of a progress bar, and one text content corresponds to one text playing time information.

Specifically, the text playing time information displayed in a fourth setting area (e.g., area 3 in fig. 4) other than the video playing area in the current interface is adjusted by the user, and when the time adjustment information set for the text starting time is acquired, the text starting time is adjusted based on the time adjustment information, and the display image of the video playing area is synchronously moved along with the movement of the real time, that is, the current display image of the video playing area is replaced with the image indicated by the adjusted text starting time; when the time adjustment information set for the text end time is acquired, the text end time is adjusted based on the time adjustment information, and the display image of the video playing area is synchronously moved along with the movement of the time, namely, the image indicated by the adjusted text end time is adopted to replace the current display image of the video playing area.

Optionally, when the text display time is adjusted, the source video may also be adjusted synchronously with the adjustment time. For example, as shown in fig. 4, when the text corresponding to "CCC" is dragged from the start time to time t, the display screen in area 1 also changes and stays on the screen corresponding to time t.

For example, the total duration of the video is 5 minutes, the corresponding start time is 00:00:00, the end time is 00:05:00, and the playing time of the added text (displayed in a progress bar form) is also 00:00: 00-00: 05:00, at this time, the current display picture in the video playing area stays at the time of 00:00:00, and if the starting point position of the progress bar is moved to 00:01:00, the display picture in the video playing area synchronously changes and finally stays at the image corresponding to the time of 00:01: 00.

S207, the video processing equipment synthesizes the content parameters and the text content to obtain special-effect text content corresponding to the text content;

it is to be understood that the synthesis of the content parameter with the text content refers to a process of adding a special effect to the text content.

For example, the text content is "AAA", the content parameter is "barrage, red, green, and yellow gradation", and the generated special effect text content is "AAA" displayed in the form of "barrage" and "red, green, and yellow gradation".

Of course, if the text parameter of the text content is not set, the text content refers to the text content displayed by adopting the default text parameter.

S208, the video processing equipment acquires text alignment information of the text content under the current video resolution of the source video;

it is understood that the current video resolution refers to the display resolution of the video processing device currently playing the source video. The display resolution is the precision of the screen image, and refers to how many pixels can be displayed by the display.

The text alignment information refers to a display position and a display font size of the text content at the current video resolution.

In a specific implementation manner, the obtaining, by the video processing device, text alignment information of the text content at a current video resolution of the source video may include the following steps, as shown in fig. 7:

s301, the video processing equipment acquires a first center point coordinate and a first font size of the text content under a reference video resolution;

it is understood that the reference video resolution is a standard video resolution size set for all video processing devices, such as s0 × t 0.

For example, as shown in fig. 8, the reference video resolution s1 × t1 has coordinates of a center point of o1 (origin of coordinates), and coordinates of a first center point of the text content in the text editing component under s1 × t1 is o2(x2, y2), and the first font size is "three small in regular script".

S302, the video processing device obtains a scaling ratio between the current video resolution and the reference video resolution, and adjusts the first center point coordinate and the first font size based on the scaling ratio, to obtain a second center point coordinate and a second font size of the text content at the current video resolution, where the text alignment information includes the second center point coordinate and the second font size.

For example, as shown in fig. 10, if the current video resolution is s2 × t2 and the center point coordinate is o3 (origin of coordinates), the scaling ratio of the current video resolution to the reference video resolution is s2/s1 and t2/t1, at this time, by scaling the first center point coordinate o2 and the first font size by "three-small-regular-script", the second center point coordinate o4 of the text content at the current video resolution is (x2 × s2/s1 and y2 × t2/t1), and the corresponding second font size is also scaled by the same ratio, for example, "four-small-regular-script".

S209, the video processing device synthesizes the special effect text content and the source video based on the text alignment information to obtain a target video corresponding to the source video.

It is understood that the process of synthesizing the text content with the source video refers to a process of aligning the derived text content with the source video (center point alignment and font size alignment) and then adding the text content to the source video.

Specifically, a second center point of the special effect text content is overlapped with a coordinate indicated by the second center point in the source video, the special effect text content is displayed in a second font size, the special effect text content is synthesized with the source video by using a synthesis tool (such as an avvideo composition mandrel international tool), so that a target video added with the text content (namely, subtitles) is obtained, and the target video is rendered to be displayed at the current video resolution.

S210, the video processing device acquires a text editing signal input aiming at the text content, and edits the text content based on the text editing signal, wherein the text editing signal comprises at least one of the following: a text modification signal, a text deletion signal, a text query signal, and a text scaling signal.

It is understood that the text editing signal refers to a text editing operation, such as text modification, text deletion, text query or text scaling, etc., input by the user with respect to the displayed target video. That is, after subtitles are added to a video, the added subtitles can be edited.

Specifically, after the target video is generated, the playing of the target video may be controlled, and when the user wants to edit the text content in the played target video, the text content displayed in the target video may be processed by a subtitle data manager (e.g., qhostckeritemmanager) such as text modification, text deletion, text query, or text scaling.

For example, for the text contents "AAA", "BBB", and "CCC" displayed in the played target video, when the user wants to delete "AAA", the "AAA" may be deleted by the subtitle data manager based on the trigger operation input by the user for "AAA", and then when the playing is resumed, the text contents displayed in the target video include only "BBB" and "CCC".

In the embodiment of the invention, text content (subtitles) is added into a video by acquiring a text addition signal input by a source video played by a current interface, creating a text editing component in a first set area in a video playing area of the source video based on the text addition signal, acquiring text content input in the text editing component and content parameters of the text content, and synthesizing the text content and the source video based on the content parameters. The process of adding the subtitles to the video can be directly completed on the current interface, the added subtitles can be displayed on the video in real time, and the interaction real-time performance and the convenience of subtitle addition can be improved. Meanwhile, content parameters of the text content can be set, such as animation display, moving or zooming, and the text content is placed at a proper position in the video, so that the display effect is enriched, and the interestingness of video processing is increased.

The video processing method provided by the invention will be described with reference to specific implementation scenarios.

When a user watches a video on a mobile phone, the user wants to add text content (subtitles) to the video, at the moment, the user stops playing the video by clicking a playing control of a display interface and then clicks an 'subtitle adding' control of the display interface, at the moment, the mobile phone detects a text adding signal input aiming at the video, creates a text editing component (such as a text box) in a display area of the video, and can synchronously generate a display progress bar corresponding to the text box in another area;

if a plurality of text boxes are added, determining the text box currently used for inputting text content through selection of a user;

the user inputs the text content to be added through the virtual keyboard on the mobile phone, the user confirms the text content after inputting the text content, and simultaneously, the user can select the favorite text format and the corresponding text typeface, such as the font, the font size, the text color and the like, in the text format set of the display interface. After the user finishes setting, the mobile phone acquires the set parameters and the input text content;

the mobile phone synthesizes the acquired parameters and the text content to obtain special effect text content, calculates text alignment information (such as an adding position and a font size) of the added text content under the current video resolution of the mobile phone, and finally adds the special effect text content to the video based on the obtained text alignment information, so that the text content is added to a proper position in the video to obtain a target video corresponding to the source video. The added subtitles can be displayed on the video in real time, the interaction real-time performance and the convenience of subtitle adding can be improved, and the display effect is enriched.

In addition, after the addition is completed, the user can edit the added text content, such as deleting the added text content or modifying the added text content, so that the interest of the video processing is increased.

Meanwhile, the obtained target video can be shared to other users or published to a sharing platform, or the user clicks a play key to continue playing.

A video processing apparatus according to an embodiment of the present invention will be described in detail with reference to fig. 10 to 14. It should be noted that, the video processing apparatus shown in fig. 10-14 is used for executing the method of the embodiment shown in fig. 2-9 of the present invention, and for convenience of description, only the portion related to the embodiment of the present invention is shown, and details of the specific technology are not disclosed, please refer to the embodiment shown in fig. 2-9 of the present invention.

Fig. 10 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention. As shown in fig. 10, the video processing apparatus 1 according to an embodiment of the present invention may include: an object creation unit 11, a parameter acquisition unit 12, and a video generation unit 13.

The component creating unit 11 is configured to acquire a text addition signal input for a source video played on a current interface, and create a text editing component in a first setting area in a video playing area of the source video based on the text addition signal;

it is understood that the source video refers to a video file that is playing or is about to play at the current interface. The format of the source video can be AVI format, QuickTime format, RealVideo format, NAVI format, DivX format, MPEG format, or the like. The source video file can be acquired through a video input unit of the video processing device after an operation signal for acquiring the video file is input on the video processing device by a user, for example, the source video file is selected from a local video library (such as an album), or is acquired currently through camera shooting, or is acquired currently through network downloading, and the like.

It should be noted that the first setting area is any area in the video playing area, and a plurality of text editing components may be included in the first setting area, that is, a plurality of texts may be simultaneously displayed in the video playing area.

A parameter acquiring unit 12 configured to acquire text content input in the text editing component, and acquire a content parameter set for the text content;

The content parameters of the text content may include text animation formats, such as voice-over, bubble, bullet, title, tremble, and ticker formats, and may be added through a caption base class (e.g., QHOStickerStyleConfig) if a new text animation format needs to be extended subsequently. Of course, the switching of the text animation format can also be realized through the switching operation input by the user on the text animation format set. The content parameters of the text content may further include text styles, such as font style, font size, font color, stroke width, and the like, and it may be understood that any one text animation format may correspond to all kinds of text styles, and it may also be understood that each text animation format corresponds to only a few set text styles. The content parameter of the text content may further include text display time information corresponding to the text content, such as a text start time, a text end time, and a text duration. One of the content parameters may be set, or two or more of them may be set simultaneously. Of course, if the user does not set parameters for the text content currently, the text parameters refer to default (preset) text parameters. The default text parameter may be preset by the user or may be self-contained by the system.

Optionally, when the text display time is adjusted, the source video may also be adjusted synchronously with the adjustment time. When time adjustment is carried out according to the text starting time, the current display image of the video playing area is adjusted to the image indicated by the adjusted text starting time; when the time of the text ending time is adjusted, the current display image of the video playing area is adjusted to the image indicated by the adjusted text ending time.

And rendering the display process of the text content through a user interface UI control. For example, for an iOS system, a UILabel implementation may be employed. Here implemented using UI controls UILabel provided by the iOS system. When a user inputs new text content, all the UILabel in the current subtitle style can be traversed, and the currently input text content is assigned to the UILabel and rendered in real time. The UILabel inherits from UIView, which is a very frequently used view control in iOS that is commonly used for displaying text.

And the video generating unit 13 is configured to perform synthesis processing on the text content and the source video based on the content parameter to obtain a target video corresponding to the source video.

Of course, after the text content input in the text editing component is obtained, the text content displayed in the target video may be edited by a subtitle data manager (e.g., QHOStickerItemManager), such as modifying the text content, deleting the text content, adding the text content, and the like.

Fig. 11 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention. As shown in fig. 11, the video processing apparatus 1 according to an embodiment of the present invention may include: an object creating unit 11, a parameter acquiring unit 12, a video generating unit 13, and a text editing unit 14.

The source video is displayed in a video playing area of the current interface, and the video playing area may be a whole area of the current interface, such as a full-screen display area, or may be a partial area of the current interface. For example, as shown in fig. 4, a plurality of regions may be included in the current interface, 1 being a video play region, 2 being a video editing option display region, 3 being a time display region of text, 4 being a text animation format display region, and the like. Wherein, the source video is displayed in the video playing area 1 of the current interface.

It should be noted that the first setting area is any area in the video playing area, and a plurality of text editing components may be included in the first setting area, that is, a plurality of texts may be simultaneously displayed in the video playing area, such as "AAA", "BBB", and "CCC" are simultaneously displayed in fig. 4. When the component creating unit 11 detects the text adding signal, it does not detect whether a text editing component already exists at the default position of the default first setting area, and directly creates the text editing component again at the default position, at this time, the newly created text editing component may cover the originally existing text editing component, and it is sufficient to move the newly created text editing component or the existing text editing component to another position of the first setting area; it can also be understood that the display positions of the components in the first setting area are arranged in sequence, such as position No. 1, position No. 2, position No. 3 …, when the component creating unit 11 detects a text adding signal, sequentially traversing each position in the first setting area in sequence to see whether a text editing component already exists in each position, such as detecting whether a text editing component exists in position No. 1, if so, traversing position No. 2, then detecting again, and if no text editing component exists in position No. 2, arranging the text editing component that needs to be created currently in position No. 2; of course, when the component creating unit 11 detects the text adding signal, the text editing component may be created randomly in the video playing area directly, and if the text editing component is covered, the text editing component may be manually moved.

Optionally, before the user inputs the text adding operation, the playing source video may be paused, so that the text editing may be performed on the current frame image. The pause may be performed by touching a play key (e.g., a play control on the left side of the area 2 in fig. 4), or the source video may pause the play synchronously when a user inputs a text adding operation (e.g., clicks a "add subtitle" control in the area 2 in fig. 4), or a voice signal may be input to control the pause.

Optionally, as shown in fig. 12, the parameter obtaining unit 12 includes:

a component acquiring subunit 121, configured to acquire a text editing component from the text editing component set displayed in the first setting area;

it is to be understood that at least one text editing component is included in the set of text editing components. When a plurality of text editing components are displayed in the first setting area, as shown in fig. 2, the components carrying "AAA", the components carrying "BBB", and the components carrying "CCC" are selected by the user as the components for text content input.

A parameter obtaining subunit 122, configured to obtain the text content input in the text editing component, and obtain a content parameter set for the text content.

As shown in fig. 5, a possible text content input method is that after a text component is created in a first setting area, a virtual keyboard is popped up synchronously, text content is input through the virtual keyboard, and a confirmation gesture is clicked after the input is finished to complete the input of text.

Optionally, the parameter obtaining unit 12 is specifically configured to:

it is understood that the text animation format set is located in a second setting area, such as area 4 in fig. 4, of the current interface, except for the video playing area. At least one text animation format is included in the text animation format set, and the formats can be such as voice-over, bubble, bullet screen, title, tremble, horse race lamp and the like. Of course, when the number of included text animation formats is large, the included text animation formats may be displayed by scrolling. Of course, the switching of the text animation format can also be realized through the switching operation input by the user on the text animation format set.

Specifically, the user selects one of the text animation format sets displayed in a second setting area (e.g., area 4 in fig. 4) other than the video playing area in the current interface, and the parameter obtaining unit 12 obtains the text animation format selected in the text animation format set for the text content to complete the setting of the text animation format.

If the currently selected text animation formats for the three text contents are respectively "common subtitle", "bubble animation", and "bullet screen animation", the three text animation formats are respectively displayed in the first setting area, as shown in fig. 4.

the text style set is located in a third setting area, such as area 4 in fig. 4, except for the video playing area in the current interface. At least one text style is included in the set of text styles, the text style including font form, font size, font color, stroke width, and the like. It is to be understood that any one of the text animation formats may correspond to all kinds of text styles, and it is to be understood that each of the text animation formats corresponds to only a few set text styles.

Optionally, the parameter obtaining unit 12 is specifically configured to:

Optionally, as shown in fig. 13, the video generating unit 13 includes:

a special effect text generating subunit 131, configured to perform synthesis processing on the content parameter and the text content to obtain a special effect text content corresponding to the text content;

An alignment information obtaining subunit 132, configured to obtain text alignment information of the text content at a current video resolution of the source video;

it is understood that the current video resolution refers to the display resolution of the video processing device currently playing the source video. The display resolution is the precision of the screen image and refers to how many pixels can be displayed on the display.

Optionally, the alignment information obtaining subunit 132 is specifically configured to:

it is understood that the reference video resolution is a video resolution size set for all video processing apparatuses.

A target video generating subunit 133, configured to perform synthesis processing on the special effect text content and the source video based on the text alignment information, so as to obtain a target video corresponding to the source video.

A text editing unit 14, configured to acquire a text editing signal input for the text content, and edit the text content based on the text editing signal, where the text editing signal includes at least one of: a text modification signal, a text deletion signal, a text query signal, and a text scaling signal.

An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the embodiments shown in fig. 2 to 9, and a specific execution process may refer to specific descriptions of the embodiments shown in fig. 2 to 9, which are not described herein again.

Referring to fig. 14, a schematic structural diagram of an electronic device is provided in an embodiment of the present invention. As shown in fig. 14, the electronic device 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 14, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a video processing application program.

In the electronic device 1000 shown in fig. 14, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke a video processing application stored in the memory 1005 and specifically perform the following operations:

In one embodiment, when the processor 1001 acquires the text content input in the text editing component and acquires the content parameter set for the text content, the following operations are specifically performed:

In one embodiment, when the processor 1001 acquires the content parameter set for the text content, the following operations are specifically performed:

In one embodiment, the text playing time information includes a text starting time and a text ending time, and when the processor 1001 obtains the time adjustment information set for the text playing time information corresponding to the text content, the following operation is specifically performed:

In an embodiment, when the processor 1001 performs the synthesizing process on the text content and the source video based on the content parameter to obtain the target video corresponding to the source video, specifically performs the following operations:

In an embodiment, when the processor 1001 acquires the text alignment information of the text content at the current video resolution of the source video, the following operations are specifically performed:

In one embodiment, the processor 1001 further performs the following operations:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A video processing method, comprising:

acquiring a text adding signal input by aiming at a source video played by a current interface; the current interface comprises a video playing area used for playing the source video; the video playing area comprises a first setting area; the first setting area comprises a plurality of display positions arranged in sequence; the plurality of display positions comprise a No. 1 position and a No. 2 position which are arranged in sequence;

based on the text adding signal, sequentially traversing the plurality of display positions according to the sequence of the display positions, and if a text editing assembly exists in the position No. 1 and the text editing assembly does not exist in the position No. 2, creating a text editing assembly in the position No. 2;

acquiring text content input in the created text editing component, and acquiring content parameters set for the text content;

2. The method according to claim 1, wherein the obtaining of the content parameter set for the text content comprises:

3. The method according to claim 2, wherein the text playing time information includes a text starting time and a text ending time, and the obtaining of the time adjustment information set for the text playing time information corresponding to the text content includes:

4. The method according to claim 1, wherein the synthesizing the text content and the source video based on the content parameter to obtain a target video corresponding to the source video comprises:

5. The method of claim 4, wherein the obtaining text alignment information of the text content at a current video resolution of the source video comprises:

6. The method of claim 1, further comprising:

7. A video processing apparatus, comprising:

the component creating unit is used for acquiring a text adding signal input aiming at a source video played by a current interface; the current interface comprises a video playing area used for playing the source video; the video playing area comprises a first setting area; the first setting area comprises a plurality of display positions arranged in sequence; the plurality of display positions comprise a No. 1 position and a No. 2 position which are arranged in sequence;

the component creating unit is further configured to sequentially traverse the plurality of display positions according to a sequence of the display positions based on the text adding signal, and create a text editing component in the position No. 2 if a text editing component exists in the position No. 1 and no text editing component exists in the position No. 2;

a parameter acquisition unit configured to acquire text content input in the created text editing component, and acquire a content parameter set for the text content;

8. The device according to claim 7, wherein the parameter obtaining unit is specifically configured to:

9. The device according to claim 8, wherein the parameter obtaining unit is specifically configured to:

10. The apparatus of claim 7, wherein the video generation unit comprises:

11. The device according to claim 10, wherein the alignment information obtaining subunit is specifically configured to:

12. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1 to 6.

13. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1 to 6.