CN112040142A

CN112040142A - Method for video authoring on mobile terminal

Info

Publication number: CN112040142A
Application number: CN202010650204.5A
Authority: CN
Inventors: 翟佳璐; 李嘉良; 李大任; 曹顺达
Original assignee: Zhizhe Sihai Beijing Technology Co ltd
Current assignee: Zhizhe Sihai Beijing Technology Co ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-12-04
Anticipated expiration: 2040-07-08
Also published as: CN112040142B

Abstract

The invention relates to a method for video creation on a mobile terminal, which comprises the following steps: displaying an icon list related to image materials in a first area of a screen of the mobile terminal and displaying a text list in a second area of the screen; adjusting at least one of the icons in the icon list and the texts in the text list to set the corresponding relation between the image materials and the texts; and generating a video including the text as subtitle information based on the set correspondence relationship between the image material and the text. According to the embodiment of the invention, the display space of the screen of the mobile terminal can be fully utilized, and a user can browse and edit the image and the text material in the whole process conveniently, set the corresponding relation of the image and the text material and perform video creation.

Description

Method for video authoring on mobile terminal

Technical Field

The invention relates to the technical field of mobile terminals, in particular to a method and a device for video creation on a mobile terminal, electronic equipment and a computer-readable storage medium.

Background

With the rapid development of video platforms, video becomes a mainstream media form. Based on this, the video creation tool also becomes an important production tool for various self-media creators. The current video creation tool has the product form of transverse arrangement information and mainly provides a professional image material editing function.

With the rapid development of mobile internet and video technology, more users wish to conveniently author video content from mobile terminals. However, they use existing video authoring tools and have a high learning cost. Meanwhile, an explanation type video form is emerging at present, which is mainly based on a manuscript instead of a video picture, and is matched with a character picture or an animation with a corresponding theme to form video content with subtitles. The main body of the content is characters, and the existing video creation tool cannot provide a more convenient character editing function and a drawing matching function.

Disclosure of Invention

In view of this, in order to solve the problems that the existing video creation tool is not suitable for a mobile environment, is low in efficiency and is prone to error in the operation process, the invention provides a tool capable of conveniently and rapidly creating videos on a mobile terminal. By the aid of the tool, the display space of a screen of the mobile terminal can be fully utilized, a user can browse and edit images and text materials in the whole process conveniently, the corresponding relation of the images and the text materials is set, and video creation is carried out. For example, the graphic knowledge content is quickly and efficiently converted into high-quality video.

According to a first aspect of the present invention, there is provided a method for video authoring on a mobile terminal, comprising: displaying an icon list related to image materials in a first area of a screen of the mobile terminal and displaying a text list in a second area of the screen; adjusting at least one of the icons in the icon list and the texts in the text list to set the corresponding relation between the image materials and the texts; and generating a video including the text as subtitle information based on the set correspondence relationship between the image material and the text.

In one possible embodiment, the first area and the second area may be located on both left and right sides of the screen, and the icons in the icon list and the text in the text list are displayed in a top-down layout.

In a possible embodiment, the setting of the correspondence between the image material and the text may include: and setting the corresponding relation between the image material and the text based on the horizontal position relation between the icon of the image material and the text.

In one possible embodiment, the icon may have a control for resizing, and the method may further include: and adjusting the size of the icon through the control, and setting the corresponding relation between the image material and the text.

In one possible embodiment, the method may further include: and moving the position of the icon and/or the text to set the corresponding relation between the image material and the text.

In one possible embodiment, the method may further include: and selecting the icon and/or the text, and editing the corresponding image material and text.

In a possible embodiment, the generating a video may specifically include: providing speech related to the text; based on the time axis of the voice and the time axis of the image material, with longer-duration content as a reference, calibrating the time axis and generating a video including the voice and image material.

In one possible embodiment, the image material may be any one of a still picture, a moving picture, and a video image.

In one possible embodiment, the method may further include: acquiring image-text contents comprising image materials and text contents; segmenting the text content, and forming an initial corresponding relation between the image material and a paragraph of the text content; segmenting paragraphs of the text content into sentences to form the text list; and displaying an icon list related to the image material in a first area of a screen of the mobile terminal and displaying the text list in a second area of the screen according to the image material, the text list and the initial corresponding relation.

In one possible embodiment, the method may further include: and selecting and importing image materials corresponding to the texts in the text list from a material library.

In one possible embodiment, the method may further include: and selecting a text in the text list, and calculating image materials related to semantics according to the semantics of the text to serve as the image materials corresponding to the text.

According to a second aspect of the present invention, there is provided an apparatus for video authoring on a mobile terminal, comprising: a display unit for displaying an icon list related to an image material in a first area of a screen of the mobile terminal and a text list in a second area of the screen; the adjusting unit is used for adjusting at least one of the icons in the icon list and the texts in the text list so as to set the corresponding relation between the image materials and the texts; and a video generating unit that generates a video including the text as subtitle information based on the set correspondence relationship between the image material and the text.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of the first aspect.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not intended to be to scale as practical, emphasis instead being placed upon illustrating the subject matter of the present application.

Fig. 1 is a schematic operation interface diagram showing an editing mode for authoring a video in a mobile terminal according to an embodiment of the present invention.

Fig. 2 is a schematic operation interface diagram illustrating another editing mode for authoring a video in a mobile terminal according to an embodiment of the present invention.

Fig. 3 shows a schematic operation interface diagram for adjusting the correspondence between image materials and texts in a mobile terminal according to an embodiment of the invention.

Fig. 4 illustrates a schematic operation interface diagram for editing text in a mobile terminal according to an embodiment of the present invention.

Fig. 5 shows a schematic operator interface diagram of the teletext content and imported image material to be used for generating a video according to an embodiment of the invention.

Fig. 6A-6C show schematic diagrams of the generation of image material, text and their correspondence from teletext content according to an embodiment of the invention.

Fig. 7 shows a schematic flow chart of a method for video authoring on a mobile terminal according to an embodiment of the present invention.

Fig. 8 shows a schematic block diagram of an apparatus for video authoring on a mobile terminal according to an embodiment of the present invention.

Fig. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Fig. 1 is a schematic operation interface diagram showing an editing mode for authoring a video in a mobile terminal according to an embodiment of the present invention. The editing mode shown in fig. 1 may also be referred to as a timeline mode; conventional video authoring tools typically use this view mode.

As shown, a series of materials for video composition, including image materials 110, subtitle text 120, and music materials 130, are shown in the lower portion of the screen of the mobile terminal. The image material 110 may be one of a still picture, a moving picture, or a video image. The subtitle text 120 may be any type of text data that is divided into a plurality of sentences or paragraphs for display in the video image as the video is played. The musical material may be any custom audio, for example, commentary audio corresponding to the subtitle text 120, which may be human speech generated based on any intelligent algorithm.

In the editing interface of the timeline mode, a video window 140 for previewing is displayed on the upper portion of the mobile terminal. At the same time, the user can align the image material 110, the subtitle text 120, and the music material 130 with the current timeline 150. In one embodiment, the user can drag any one or more of the image material 110, subtitle text 120, and music material 130 left and right to align the materials, i.e., to achieve synchronization of images, text, and sound. In addition, the user can touch the image material 110, the subtitle text 120, and the music material 130 to achieve editing of the material at the touched position, for example, beautifying the video material, modifying the subtitle text content, and adjusting the corresponding audio.

It can be seen that in the editing mode shown in fig. 1, material information such as image material 110, subtitle text 120, audio material, etc. is arranged horizontally, a large space in the upper part of the screen is used to display video material, and text content is displayed in a thumbnail manner. The disadvantages of this lateral arrangement of information are:

firstly, the video occupies a large display space, so that the text content is displayed in an abbreviated manner, and the creator cannot browse the text content throughout; at this time, the user needs to continuously perform right screen sliding operation to browse complete text content, which does not accord with the character reading habit of people;

second, when a new character needs to be inserted, an image material must be attached, that is, a character exists after an image video, and the narration video is usually assembled after a character manuscript. This kind of creation process is contrary to it, and can not provide a smoother creation experience;

and thirdly, the characters are used as auxiliary content, a relatively convenient editing mode is not provided, only the operation of single sentence characters is supported, and the functions of batch editing of character content, batch changing of character positions, batch synthesis dubbing and the like are not supported.

And fourthly, the video editing function also focuses on the editing function of a single video, and the function of conveniently adjusting the corresponding relation between materials and characters is not provided.

And fifthly, the automatic conversion of the text content into the video draft is not supported.

In view of the above, the present invention provides a mode of "outline editing" based on the conventional video authoring tool. For example, the schema mode may be switched to by the schema edit 160 control on the interface shown in FIG. 1. Under the outline mode, the image video materials and the characters are longitudinally arranged, the video is displayed in a reduced mode, and the characters are displayed in more space, so that a user can conveniently browse the whole text, and meanwhile, the display form of the characters is more in line with the traditional reading habit.

Fig. 2 is a schematic operation interface diagram illustrating another editing mode (also referred to as outline mode) for authoring a video at a mobile terminal according to an embodiment of the present invention. It should be noted that, in editing the material, the editing function of a single material can still be completed in the traditional timeline mode, and the outline mode focuses more on adjusting the corresponding relationship between the material and the characters.

As shown in fig. 2, a plurality of icons 210 related to image video material are shown in the left area of the screen of the mobile terminal; a plurality of texts 220 divided into sentence forms with separation lines therebetween are shown in the right area of the screen. In the left area, the icons 210 related to the image material are formed as an icon list; the text 220 is formed as a text list in the right area. Also shown in FIG. 2 is a current timeline 250, which is similar to the timeline 150 of FIG. 1. The current video preview 240 may be displayed on top of the screen according to the location of the current timeline 250.

According to the embodiment of the present invention, the correspondence of the image video material and the text can be set according to the positional relationship of the icon 210 and the text 220 on the screen. Specifically, the correspondence relationship of the image video material and the text is set according to the size or height of the icon 210, for example, the horizontal position relationship of the upper end position, the lower end position, and the text of the right area of the icon 210. For example, after the user selects the material 210, two controls, such as the

handles

211 and 212, are displayed on the material 210, and the size or height of the icon 210 can be adjusted by dragging the

handles

211 and 212, so as to quickly adjust the correspondence between the image video material and the text. It will be appreciated that the adjustment icon 210 is larger, allowing the image video material to correspond to more text content, and that the adjustment icon 210 is smaller and the text content corresponding to the material is also smaller.

It should be noted that, in both the time axis mode of fig. 1 and the outline mode of fig. 2, the overall effect of the video can be quickly previewed by using the time line. Especially, the outline mode can remarkably improve the preview efficiency and the operation efficiency of the corresponding relation, and the user operation is more concentrated and efficient.

Fig. 3 shows a schematic operation interface diagram for adjusting the correspondence between image materials and texts in a mobile terminal according to an embodiment of the invention. In addition to adjusting the correspondence between the image material and the text by adjusting the size or height of the icon 210, the correspondence between the material and the material, and the front-back order of the material can be quickly adjusted by long pressing the material.

The left side view of fig. 3 shows a schematic diagram of long pressing and dragging an icon 310 to move the corresponding video material, thereby adjusting the correspondence of the image video material with the text. The user long-presses the icon 310, and at this time, the effect of the icon 310 floating up is displayed on the screen, and the user can drag the icon 310 to insert it at a desired position. For example, the icon 310 may be dragged upwards, inserted into a front position, to play the video material earlier; or conversely, the icon 310 may be dragged down to insert into a rear position so that the video material is played later.

In one embodiment, after the icon 310 is dragged away from the original position, the original position may be displayed as black or any other color, suggesting that there is no image video material corresponding to the text. Those skilled in the art will appreciate that the dragged icon may be further resized to increase or decrease its corresponding text.

The right side view of fig. 3 shows a schematic diagram of a long press and drag of the text 320 "i am from the age of twelve" (e.g., the user presses the menu item on the right side of the text), thereby adjusting the correspondence of the image video material to the text. Similarly, the effect of the floating text 320 is displayed on the screen, and the user can drag the text 310 to insert it at a desired location. For example, the text 320 may be dragged upward, inserted into a front position, to be displayed earlier when the video is played; or conversely, the text 320 may be dragged down to insert into a rear position so that the text is displayed later.

In one embodiment, after the text 320 is dragged away from the original location, the original location may be implemented as a blank or any other color, suggesting that there is no text corresponding to the image video material.

It can be seen that in the outline mode, the corresponding relation between the added subtitle text and the image video material is explicitly previewed, so that the problems of errors and low efficiency caused by editing the video material are reduced.

Fig. 4 illustrates a schematic operation interface diagram for editing text in a mobile terminal according to an embodiment of the present invention. Fig. 4 shows an interface diagram of text editing, more operations, and multi-selection text operations in sequence from left to right.

In the outline mode shown in fig. 2, editing can be performed by clicking any one of the texts 220. As shown in the left diagram of fig. 4, compared with the horizontal information display mode of fig. 1, the text editing view of the present invention can display more text contents, including the edited text itself and the context of the text, which is very convenient and more suitable for the reading habit of the user.

As shown in the middle diagram of fig. 4, a user can select a text sentence, so that menu items including "multiple selection", "copy", "paste", and the like are presented on the screen, and the text is quickly edited. If "multi-selection" is selected, an interface as shown in the right diagram of fig. 4 is presented on the screen, and a user can check a plurality of texts and edit the texts together, such as copying the texts, cutting the texts and deleting the texts.

Under "watch" applications, users often need to generate commentary-like video from mixed-type teletext content that includes pictures (including still pictures, moving pictures, or video images) and text. As shown on the left side of fig. 5, the user answer content includes several pieces of text and pictures. As shown on the right side of fig. 5, the user may also select a number of image materials from a library for use in generating a narration-like video. The lower operation bar can perform quick material management operation on the currently selected material, such as material editing, material replacement, material deletion and material batch import. In one embodiment, the material can be imported in batch, and after the manuscript is edited, the matching picture can be quickly selected from the bottom material column. That is, the user is free to add and delete image material, and is not limited to material within the original mixed-type teletext content.

Therefore, the efficiency of managing and editing the image video materials and the texts by the user is improved, the contents and the texts are added quickly, and the subtitle texts are modified and adjusted quickly and conveniently.

A typical example of generating the initial correspondence of the image video material and the subtitle text described with reference to fig. 1 to 4 is explained below in conjunction with fig. 6A to 6C.

Fig. 6A-6C show schematic diagrams of the generation of image material, text and their correspondence from teletext content according to an embodiment of the invention. The invention realizes the initialization of image materials, texts and the corresponding relation thereof based on three stages of 'material preprocessing', 'intelligent segmentation' and 'intelligent clause'.

Firstly, preprocessing materials.

● preprocessing logic of picture material: only static graphs and GIFs can be imported, and black field video display is needed when import fails. And the video is not imported, and the imported material does not occupy the space for display.

● preprocessing logic of text material: discarded words include code blocks, comments, links (words are reserved when the words and addresses are different), formulas, only spaces in the native paragraph or in the list item. The reserved words include the title, reference block, words after the ordered list (reserved serial numbers, added to the end of each period if there is no separator/comma), words after the unordered list (with the serial number removed, added to the end of each period if there is no separator/comma). All text conversions are in clear format.

And secondly, intelligently processing in a segmented mode.

● text segmentation is performed first: each natural segment is a "" segment "" based on the natural segments. If there are 2 or more consecutive natural segments, the number of words is 5, these natural segments are combined into one "segment".

●, the pictures and the text paragraphs are in one-to-one correspondence: a single picture, and the corresponding characters are the nearest natural sections (after merging processing) above the picture; the method comprises the following steps that (1) a plurality of continuous pictures correspond to each other in a reverse order, wherein the pictures and the characters correspond to each other in the reverse order (namely the last picture corresponds to the last section, the penultimate picture corresponds to the penultimate section, and the rest can be done in the same way … …); if there is no text corresponding to the picture, the default black field video is corresponding to the default black field video, such as the first segment, the second segment, and the third segment in fig. 6A. If more pictures than the natural segment appear in the corresponding text interval, the more pictures behind correspond to the blank subtitles, as shown in "fig. 3" in fig. 6B: . If a space exists between the two pictures or only a line feed character exists between the two pictures, the two pictures are regarded as continuous; if there is a dividing line between two pictures, if all the characters are discarded, it is regarded as discontinuous, as in fig. 6C, "fig. 2" does not correspond to any subtitle character.

And thirdly, intelligent clause processing.

● each segment of text is cut into shorter sentences, each sentence corresponding to a line of subtitles.

● the sentence is first divided by a separator: a fragment includes. "to"; "to"; "is a unit of: "is" to be "presented"? "is"? "" ]! "" ]! … "" a "" column transfer character "" and two or more "" columns "". One delimiter + quote is a delimiter; if the tail of a sentence is 'left quotation mark', the 'left quotation mark' is moved to the beginning of the next sentence, and if there is no next sentence, the left quotation mark is deleted. One segmenter + bracket is a segmenter; if the end of a sentence is left bracket after division, the left bracket is moved to the beginning of the next sentence, and if there is no next sentence, the left bracket is deleted. A plurality of separators appear continuously and are regarded as one separator; between two separators, if all characters are discarded or only spaces are left, the characters are considered to be continuous. If the segmentor is numeric before and after, the segmentor is not processed (e.g., two-team score is 1: 1) (e.g., total annual production 194, 211, 400). Other punctuation marks are not processed and are normally displayed in the caption

● the overlength sentence is divided again: if the divided sentences exceed 26 Chinese characters (English, punctuation and number are half Chinese characters), the sentences are cut from the last ' within 26 characters to ' 26 ' until all the sentences are less than 26 Chinese characters. If there are a plurality of "" consecutive "" combined into one "". Without commas, even if the sentence is >26 kanji, it is no longer cut.

● punctuation processing: each line of caption in the video displays the tail of a sentence, "", "". "is a unit of: the hiding process.

Thus, the image video material, the text and the corresponding relation thereof are generated. It will be appreciated by those skilled in the art that the user may further adjust, edit, import image video material, text and their correspondence as described above with reference to fig. 1 to 5. For example, aiming at the subtitle text lacking the corresponding image video material, the invention also provides an automatic matching function, and related pictures are automatically calculated and matched for matching according to the semantics of characters, so that the step of matching pictures by searching the material by a user is simplified.

Next, based on the set correspondence relationship of the image material and the text, a video including the text as subtitle information is generated. In one embodiment, a one-touch dubbing function is also provided, and the user can generate corresponding commentary audio from the text. For example, according to user preferences, a male, female or child voice is selected and the volume level is configured to generate commentary audio. The explaining audio can be combined with the image video material and the text to form the explaining video. According to the embodiment of the invention, when the video is generated, an automatic calibration function is also provided, and when the video and the text time length (for example, the playing time length of the comment audio corresponding to the text) are inconsistent, the time axis is automatically calibrated by taking the content with longer time length as a reference, so that the operation steps of a user are simplified.

Fig. 7 shows a schematic flow chart of a method 700 for video authoring on a mobile terminal according to an embodiment of the present invention.

The method 700 comprises: displaying an icon list related to image materials in a first area of a screen of the mobile terminal and a text list in a second area of the screen in step 710;

in step 720, adjusting at least one of the icon in the icon list and the text in the text list to set the corresponding relationship between the image material and the text; and

in step 730, a video including the text as subtitle information is generated based on the set correspondence relationship between the image material and the text.

Fig. 8 shows a schematic block diagram of an apparatus 800 for video authoring on a mobile terminal according to an embodiment of the present invention.

Apparatus 800 for video authoring on a mobile terminal, comprising: a display unit 810 for displaying an icon list related to image material in a first area of a screen of the mobile terminal and a text list in a second area of the screen;

an adjusting unit 820, configured to adjust at least one of an icon in the icon list and a text in the text list to set a correspondence between an image material and the text; and

a video generating unit 830 for generating a video including the text as subtitle information based on the set correspondence relationship of the image material and the text.

In one possible embodiment, the adjusting unit 820 may further be configured to: and moving the position of the icon and/or the text to set the corresponding relation between the image material and the text.

In one possible embodiment, the adjusting unit 820 may further be configured to: and selecting the icon and/or the text, and editing the corresponding image material and text.

In one possible embodiment, the video generation unit 830 may be configured to: providing speech related to the text; based on the time axis of the voice and the time axis of the image material, with longer-duration content as a reference, calibrating the time axis and generating a video including the voice and image material.

In one possible embodiment, the apparatus 800 may include an initialization unit (not shown in the figures). The initialization unit is used for: acquiring image-text contents comprising image materials and text contents; segmenting the text content, and forming an initial corresponding relation between the image material and a paragraph of the text content; and dividing paragraphs of the text content into sentences to form the text list. Thus, the display unit 810 may be configured to display an icon list related to the image material in a first area of a screen of the mobile terminal and display the text list in a second area of the screen according to the image material, the text list and the initial correspondence. The initialization unit 820 may also be configured to select and import image materials corresponding to the texts in the text list from a material library; and selecting a text in the text list, and calculating image materials related to semantics according to the semantics of the text to serve as the image materials corresponding to the text.

Based on the detailed explanation of the invention, it can be seen that the invention breaks through the information organization form of the traditional video creation tool, uses the characters as the core to longitudinally arrange information, and uses more space to display the characters, so that the user can conveniently browse throughout, and the browsing mode is more in line with the traditional reading habit.

The method has the advantages that a more convenient character editing function is provided, the input of single sentence characters is more primitive and biochemical, the input mode is consistent with the traditional character input mode, and the learning cost is reduced; on the basis of the method, a function of editing characters in batches is provided, and more complex character editing requirements are met; meanwhile, the function of one-key dubbing is provided, and the dubbing generated by voice synthesis is provided, so that the time for dubbing of a user is saved.

On the basis of the traditional video editing function, a more macroscopic material editing function is provided, the corresponding relation between materials and characters is emphasized, batch matching of characters is supported, and the use habit of video creation by a graph-text author is better met. The automatic calibration function is provided, when the video and the text time length are inconsistent, the time shaft is automatically calibrated by taking the content with longer time length as a reference, and the operation steps of a user are simplified. The automatic matching function is provided, the matching of related pictures is automatically calculated and matched for matching according to the semantics of characters, and the step of matching pictures by searching materials by a user is simplified.

One-key conversion of the existing image-text content into a video draft is supported, and intelligent segmentation and sentence division are carried out; meanwhile, the materials are imported in batch to generate the video drafts, so that the time for importing the materials is saved.

Fig. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic apparatus 900 includes a Central Processing Unit (CPU)901 which can execute various appropriate actions and processes as shown in fig. 7 according to a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

Embodiments of the present invention provide a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform any one of the methods shown in fig. 7. By way of example, computer-readable storage media can be any available media that can be accessed by a computer or a data storage device, such as a server, data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, only the division of the functional modules is illustrated, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for video authoring on a mobile terminal, comprising:

displaying an icon list related to image materials in a first area of a screen of the mobile terminal and displaying a text list in a second area of the screen;

adjusting at least one of the icons in the icon list and the texts in the text list to set the corresponding relation between the image materials and the texts; and

and generating a video including the text as subtitle information based on the set correspondence between the image material and the text.

2. The method of claim 1, wherein the first area and the second area are located on left and right sides of the screen, and the icons in the icon list and the text in the text list are displayed in a top-down layout.

3. The method of claim 2, wherein the setting of the correspondence of image material to text comprises: and setting the corresponding relation between the image material and the text based on the horizontal position relation between the icon of the image material and the text.

4. The method of any of claims 1-3, wherein the icon has a control for resizing, the method further comprising: and adjusting the size of the icon through the control to set the corresponding relation between the image material and the text.

5. The method of any one of claims 1-3, further comprising: and moving the position of the icon and/or the text to set the corresponding relation between the image material and the text.

6. The method of any one of claims 1-3, further comprising: and selecting the icon and/or the text, and editing the corresponding image material and text.

7. The method of claim 1, wherein the generating a video specifically comprises: providing speech related to the text; based on the time axis of the voice and the time axis of the image material, with longer-duration content as a reference, calibrating the time axis and generating a video including the voice and image material.

8. The method of claim 1, wherein the image material is any one of a still picture, a moving picture, and a video image.

9. The method of claim 1, further comprising:

acquiring image-text contents comprising image materials and text contents;

segmenting the text content, and forming an initial corresponding relation between the image material and a paragraph of the text content;

segmenting paragraphs of the text content into sentences to form the text list; and

and displaying an icon list related to the image material in a first area of a screen of the mobile terminal and displaying the text list in a second area of the screen according to the image material, the text list and the initial corresponding relation.

10. The method of claim 1, further comprising: and selecting and importing image materials corresponding to the texts in the text list from a material library.

11. The method of claim 1, further comprising: and selecting a text in the text list, and calculating image materials related to semantics according to the semantics of the text to serve as the image materials corresponding to the text.

12. An apparatus for video authoring on a mobile terminal, comprising:

a display unit for displaying an icon list related to an image material in a first area of a screen of the mobile terminal and a text list in a second area of the screen;

the adjusting unit is used for adjusting at least one of the icons in the icon list and the texts in the text list so as to set the corresponding relation between the image materials and the texts; and

a video generating unit for generating a video including the text as subtitle information based on the set correspondence relationship between the image material and the text.

13. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, which when executed by the processor implements the method of any one of claims 1-11.

14. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the method of any one of claims 1-11.