CN112040142B

CN112040142B - Method for video authoring on mobile terminal

Info

Publication number: CN112040142B
Application number: CN202010650204.5A
Authority: CN
Inventors: 翟佳璐; 李嘉良; 李大任; 曹顺达
Original assignee: Zhizhe Sihai Beijing Technology Co ltd
Current assignee: Zhizhe Sihai Beijing Technology Co ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2023-05-02
Anticipated expiration: 2040-07-08
Also published as: CN112040142A

Abstract

The invention relates to a method for video authoring on a mobile terminal, comprising: displaying an icon list related to the image material in a first area of a screen of the mobile terminal and displaying a text list in a second area of the screen; at least one of icons in the icon list and texts in the text list is adjusted to set the corresponding relation between the image materials and the texts; and generating a video including the text as subtitle information based on the set correspondence of the image material and the text. According to the embodiment of the invention, the display space of the mobile terminal screen can be fully utilized, and a user can conveniently browse and edit images and text materials throughout, set the corresponding relation of the images and the text materials and perform video creation.

Description

Method for video authoring on mobile terminal

Technical Field

The present invention relates to the field of mobile terminals, and in particular, to a method, an apparatus, an electronic device, and a computer readable storage medium for video authoring on a mobile terminal.

Background

With the rapid development of video platforms, video is a dominant media modality. Based on this, video authoring tools are also an important production tool for various types of self-media creators. The existing video creation tool has the product form of transverse arrangement information, and mainly provides a professional image material editing function.

With the rapid development of mobile internet and video technologies, more users wish to conveniently compose video contents from mobile terminals. However, they use existing video authoring tools and learning costs are high. Meanwhile, a video form of explanation is emerging, wherein video pictures are not taken as the main part, but text manuscripts are taken as the main part, and text pictures or moving pictures of corresponding subjects are matched to form video content with subtitles. The main body of the content is text, and the existing video creation tool cannot provide a convenient text editing function and a picture matching function.

Disclosure of Invention

In view of this, in order to solve the problem that the existing video creation tool is not suitable for mobile environment, the efficiency is low and the error is easy to occur in the operation process, the invention provides a tool capable of conveniently creating video on a mobile terminal. Through the tool, the display space of the mobile terminal screen can be fully utilized, and a user can conveniently browse and edit images and text materials throughout, set the corresponding relation of the images and the text materials and perform video creation. For example, the content of the graphic knowledge class is quickly and efficiently converted into high-quality video.

According to a first aspect of the present invention there is provided a method for video authoring on a mobile terminal, comprising: displaying an icon list related to the image material in a first area of a screen of the mobile terminal and displaying a text list in a second area of the screen; at least one of icons in the icon list and texts in the text list is adjusted to set the corresponding relation between the image materials and the texts; and generating a video including the text as subtitle information based on the set correspondence of the image material and the text.

In one possible embodiment, the first area and the second area may be located at left and right sides of the screen, and the icons in the icon list and the text in the text list are displayed in a top-down layout.

In one possible embodiment, the setting the correspondence between the image material and the text may include: and setting the corresponding relation between the image material and the text based on the horizontal position relation between the icon of the image material and the text.

In one possible embodiment, the icon may have a control for resizing, and the method may further comprise: and adjusting the size of the icon through the control, and setting the corresponding relation between the image material and the text.

In one possible embodiment, the method may further comprise: and moving the positions of the icons and/or the texts so as to set the corresponding relation between the image materials and the texts.

In one possible embodiment, the method may further comprise: and selecting the icon and/or the text, and editing the corresponding image material and text.

In one possible embodiment, the generating the video may specifically include: providing speech associated with the text; based on the time axis of the voice and the time axis of the image material, the time axis is calibrated and a video including the voice and the image material is generated based on the content having a longer duration.

In one possible embodiment, the image material may be any of a still picture, a moving picture, and a video image.

In one possible embodiment, the method may further comprise: acquiring image-text content comprising image materials and text content; segmenting the text content and forming an initial corresponding relation between the image material and a paragraph of the text content; forming a paragraph sentence of the text content into the text list; and displaying an icon list related to the image material in a first area of a screen of the mobile terminal and displaying the text list in a second area of the screen according to the image material, the text list and the initial correspondence.

In one possible embodiment, the method may further comprise: image material corresponding to text in the text list is selected and imported from a material library.

In one possible embodiment, the method may further comprise: and selecting the text in the text list, and calculating the image material related to the semantics according to the semantics of the text as the image material corresponding to the text.

According to a second aspect of the present invention, there is provided an apparatus for video authoring on a mobile terminal, comprising: a display unit for displaying an icon list related to the image material in a first area of a screen of the mobile terminal and displaying a text list in a second area of the screen; an adjusting unit, configured to adjust at least one of an icon in the icon list and a text in the text list, so as to set a correspondence between an image material and the text; and a video generation unit for generating a video including the text as subtitle information based on the set correspondence of the image material and the text.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.

According to a fourth aspect of the present disclosure there is provided a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to the first aspect.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art. The above and other objects, features and advantages of the present application will become more apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the several views of the drawings. The drawings are not intended to be drawn to scale, with emphasis instead being placed upon illustrating the principles of the present application.

Fig. 1 shows a schematic operation interface diagram of an editing mode of authoring video at a mobile terminal according to an embodiment of the present invention.

Fig. 2 shows a schematic operation interface diagram of another editing mode of authoring video at a mobile terminal according to an embodiment of the present invention.

Fig. 3 shows a schematic operation interface diagram for adjusting the correspondence of image material and text at a mobile terminal according to an embodiment of the invention.

Fig. 4 shows a schematic operation interface diagram for editing text at a mobile terminal according to an embodiment of the present invention.

Fig. 5 shows a schematic operation interface diagram of a teletext content and imported image material to be used for generating a video according to an embodiment of the invention.

Fig. 6A-6C are schematic diagrams illustrating the generation of image material, text and their correspondence from teletext content, according to an embodiment of the invention.

Fig. 7 shows a schematic flow chart of a method for video authoring on a mobile terminal in accordance with an embodiment of the present invention.

FIG. 8 shows a schematic block diagram of an apparatus for video authoring on a mobile terminal in accordance with an embodiment of the invention.

Fig. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The words "a", "an", and "the" as used herein are also intended to include the meaning of "a plurality", etc., unless the context clearly indicates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Fig. 1 shows a schematic operation interface diagram of an editing mode of authoring video at a mobile terminal according to an embodiment of the present invention. The editing mode shown in fig. 1 may also be referred to as a timeline mode; conventional video authoring tools typically use this view mode.

As shown, a series of materials for video authoring, including image materials 110, subtitle text 120, music materials 130, are shown at the lower part of the mobile terminal screen. The image material 110 may be one of a still picture, a moving picture, or a video image. The subtitle text 120 may be any type of text data that is divided into a plurality of sentences or paragraphs for display with video playback in a video image. The musical material may be any custom audio, for example, may be narrative audio corresponding to the caption text 120, which may be human speech generated based on any intelligent algorithm.

In the editing interface of the time axis mode, a video window 140 of the preview is displayed on the upper portion of the mobile terminal. At the same time, the user may align the image material 110, subtitle text 120, musical material 130 with the current timeline 150. In one embodiment, the user may drag any one or more of the image material 110, subtitle text 120, and music material 130 left and right to align the materials, i.e., to achieve synchronization of images, text, and sound. In addition, the user may also touch the image material 110, the subtitle text 120, and the music material 130 to edit the material at the touch location, for example, beautify the video material, modify the subtitle text content, and adjust the corresponding audio.

It can be seen that, in the editing mode shown in fig. 1, the material information such as the image material 110, the subtitle text 120, the audio material, etc. is arranged horizontally, and a larger space at the upper part of the screen is used for displaying the video material, and the text content is displayed in a thumbnail manner. The disadvantage of this lateral arrangement information is that:

1. the video occupies a larger display space, so that the text contents are displayed in a shortened mode, and an author cannot browse the text contents throughout; at this time, the user needs to continuously perform right-slide screen operation to browse complete text contents, and the text reading habit of the user is not met;

2. when new text is needed to be inserted, the new text must be attached to an image material, that is, the existing image video is followed by text, and the comment video is usually prepared after the existing text. In contrast, such authoring processes fail to provide a smoother authoring experience;

3. the text is used as an auxiliary content, a more convenient editing mode is not provided, only the operation of single sentence text is supported, and the functions of editing text content in batches, changing text positions in batches, synthesizing dubbing in batches and the like are not supported.

4. The video editing function also focuses on the editing function of a single video, and does not provide a function of conveniently adjusting the correspondence between materials and characters.

5. Automatic conversion of teletext content into video drafts is not supported.

In view of the above, the present invention provides a schema for "outline editing" based on conventional video authoring tools. For example, the schema may be switched to by a schema editing 160 control on the interface shown in FIG. 1. Under the outline mode, the image video materials and the characters are longitudinally arranged, the video is reduced and displayed, and the characters are displayed in more space, so that a user can browse the whole text conveniently, and the display form of the characters accords with the traditional reading habit.

Fig. 2 shows a schematic operation interface diagram of another editing mode (also referred to as outline mode) of authoring video at a mobile terminal according to an embodiment of the present invention. It should be noted that, in editing the material, the editing function of the single material can still be completed in the traditional time axis mode, and the outline mode is more focused on adjusting the corresponding relationship between the material and the text.

As shown in fig. 2, a plurality of icons 210 associated with image video material are shown in the left region of the screen of the mobile terminal; a plurality of texts 220 divided into sentence forms with separation lines therebetween are shown in the right region of the screen. In the left area, the icons 210 related to the image material are formed as an icon list; text 220 is formed as a text list in the right region. Also shown in FIG. 2 is a current timeline 250 that is similar to timeline 150 of FIG. 1. The current video preview 240 may be displayed in an upper portion of the screen according to the location of the current timeline 250.

According to the embodiment of the present invention, the correspondence of the image video material and the text may be set according to the positional relationship of the icon 210 and the text 220 on the screen. Specifically, the correspondence of the image video material and the text is set according to the size or height of the icon 210, for example, the horizontal positional relationship of the upper end position, the lower end position, and the text of the right area of the icon 210. For example, after the user selects the material 210, two controls, such as the

handles

211 and 212, are displayed on the material 210, and the size or the height of the icon 210 can be adjusted by dragging the

handles

211 and 212, so that the corresponding relationship between the image video material and the text can be quickly adjusted. It will be appreciated that the adjustment icon 210 may be larger such that the image video material corresponds to more text content, and that the adjustment icon 210 may be smaller and the text content corresponding to the material may be smaller.

It should be noted that, in both the timeline mode of fig. 1 and the outline mode of fig. 2, the overall effect of the video can be quickly previewed by using the timeline. Especially, the outline mode can obviously improve the preview efficiency and the operation efficiency of the corresponding relation, so that the user operation is more concentrated and efficient.

Fig. 3 shows a schematic operation interface diagram for adjusting the correspondence of image material and text at a mobile terminal according to an embodiment of the invention. In addition to adjusting the corresponding relation between the image material and the text by adjusting the size or height of the icon 210, the corresponding relation between the material and the sequence of the material can be quickly adjusted by long pressing the material.

The left side view of fig. 3 shows a schematic of long press and drag of icon 310 to move the corresponding video material to adjust the correspondence of image video material to text. The user presses the icon 310 for a long time, at which time the effect of the icon 310 floating is displayed on the screen, and the user can drag the icon 310 to insert it at a desired position. For example, icon 310 may be dragged upward, inserted into a forward position, to play the video material earlier; or conversely, icon 310 may be dragged downward, inserted into a rear position, to play the video material later.

In one embodiment, after icon 310 is dragged away from the home position, the home position may be displayed in black or any other color, suggesting that there is no image video material corresponding to text. Those skilled in the art will appreciate that the dragged icon may be further resized to increase or decrease its corresponding text.

The right side view of fig. 3 shows a schematic diagram of long pressing and dragging text 320 "i am from twelve years old" (e.g., the user holding a menu item on the right side of the text) to adjust the correspondence of image video material to the text. Similarly, the effect of floating text 320 is displayed on the screen and the user can drag text 310 to insert it at the desired location. For example, text 320 may be dragged upward, inserted into a forward position, to display the text earlier when the video is played; or conversely, text 320 may be dragged downward, inserted into a rear position, to be displayed later.

In one embodiment, text 320 may be rendered blank or any other color at the home location after being dragged away from the home location, suggesting that there is no text corresponding to the image video material.

It can be seen that in the outline mode, the correspondence between the added caption text and the image video material is clearly previewed, so that the problems of errors and low efficiency caused by editing the video material are reduced.

Fig. 4 shows a schematic operation interface diagram for editing text at a mobile terminal according to an embodiment of the present invention. Fig. 4 shows interface diagrams of text editing, more operations, and multiple choice text operations in order from left to right.

In the outline mode shown in fig. 2, editing can be performed by clicking on any text 220. As shown in the left diagram of fig. 4, compared with the lateral information display mode of fig. 1, the text editing view of the present invention can display more text content, including the edited text itself and the context of the text, which can be seen, is very convenient and more suitable for the reading habit of the user.

As shown in the middle diagram of fig. 4, a user may select a text sentence so that menu items, including "multi-select", "copy", "paste", etc., are presented on the screen, and the text is quickly edited. If "multiple choice" is selected, an interface is presented on the screen as shown in the right-hand diagram of fig. 4, and the user can sort through multiple texts, edit them together, such as copy text, cut text, delete text.

Under "know" applications, users often need to generate narrative-like video from mixed-type content including pictures (including still, moving, or video images) and text. As shown on the left side of fig. 5, the user answer content includes several pieces of text and pictures. As shown on the right side of fig. 5, the user may also select several image materials from the materials library for generating video of the narrative class. The lower operation field can perform quick management operation on the currently selected material, such as editing the material, replacing the material, deleting the material, and importing the material in batches. In one embodiment, material may be imported in bulk, and after editing the script, a profile may be quickly selected for the script from the bottom material column. That is, the user is free to add and delete image material, not limited to material within the original mixed-mode content.

It can be seen that the management and editing efficiency of the user on the image video material and the text is improved, the content and the text are added quickly, and the modification and adjustment of the caption text are quick and convenient.

A typical example of generating the initial correspondence of the image video material and the subtitle text described with reference to fig. 1 to 4 is described below in connection with fig. 6A to 6C.

Fig. 6A-6C are schematic diagrams illustrating the generation of image material, text and their correspondence from teletext content, according to an embodiment of the invention. The method realizes the initialization of the image materials, the texts and the corresponding relations thereof based on three stages of material preprocessing, intelligent segmentation and intelligent clause.

And firstly, preprocessing materials.

● Preprocessing logic of picture materials: only the static state diagram and the GIF can be imported, and black field video occupation display is needed when the importing fails. The video is not imported, and the uninformed material is not displayed in a space occupying mode.

● Preprocessing logic of the text material: discarded words include code blocks, notes, links (words are reserved when words and addresses are different), formulas, only spaces in natural segments or in list items. The characters retained include a title, a reference block, characters after the ordered list (retaining sequence numbers, adding periods if no separator/comma is at the end of each period), characters after the unordered list (removing sequence number marks, adding periods if no separator/comma is at the end of each period). All text is converted and the format is cleared.

And secondly, intelligent segmentation processing.

● Word segmentation is performed first: the natural segments are used as the segmentation basis, and each natural segment is a segment. If there are 2 or more consecutive natural segments, the word count < =5, these natural segments are combined into one "segment".

● Then, the pictures and the text paragraphs are in one-to-one correspondence: a picture is separated, and the corresponding text is the nearest natural segment above the picture (after merging processing); the pictures are continuous, the pictures and the words are corresponding in reverse order (namely, the last picture corresponds to the last section, the penultimate picture corresponds to the penultimate section, and the like … …); no text corresponding to the picture corresponds to the default black field video, such as the first segment, the second segment, the third segment, and the like in fig. 6A. If more pictures than natural segments appear in the corresponding text section, the pictures with more pictures at the back correspond to blank captions, as shown in "fig. 3" in fig. 6B: . If the space exists between the two pictures, or the line feed character exists, the two pictures are regarded as continuous; if the text is discarded entirely or there is a split line between the two pictures, it is considered discontinuous, as in fig. 6C, "fig. 2" does not correspond to any subtitle text.

And thirdly, intelligent clause processing.

● Each segment of text is cut into shorter sentences, each sentence corresponding to a line of subtitles.

● First, sentences are divided by a segmenter: the segmenter includes a "partition. "the"; "the"; "the: "? "? "the! "the! …, line-feed, two or more of which are "". One segmenter + quote = segmenter; if the end of a certain sentence is the left quotation mark after segmentation, the left quotation mark is moved to the head of the next sentence, and if the next sentence is not available, the sentence is deleted. One segmenter + bracket = segmenter; if the end of a sentence is left bracket after segmentation, the left bracket is moved to the next sentence head, if there is no next sentence, the sentence is deleted. A plurality of separators appear in succession, being regarded as one separator; between two segmenters, text is considered continuous if it is discarded entirely or has only spaces. If the segmenter is numeric both before and after, the segmenter is not processed (e.g., the two team score is 1:1) (e.g., total annual yield 194, 211, 400). Other punctuations are not processed and are normally displayed in the subtitles

● Dividing the ultralong sentences again: if the divided sentences exceed 26 Chinese characters (English, punctuation and numerical calculation of half Chinese characters), the sentences are cut from the last ' and ' the ' of the 26 words until all the sentences are < =26 Chinese characters. If there are a plurality of "s", "s" are consecutive, they are combined into one "s". If no comma exists, even if the sentence is more than 26 Chinese characters, the Chinese characters are not cut.

● Punctuation treatment: each line of subtitles in the video, the sentence ends show the "" and "". "the: the hiding process.

The image video material, text and corresponding relation are generated. It will be appreciated by those skilled in the art that the user may further adjust, edit, import image video material, text and their correspondence as described above with reference to fig. 1-5. For example, aiming at the caption text lacking the corresponding image video material, the invention also provides an automatic picture matching function, and according to the semantics of the characters, the picture matching is automatically calculated and carried out on the matched related pictures, thereby simplifying the step of searching the material for picture matching by the user.

Next, based on the correspondence of the set image material and text, a video including text as subtitle information is generated. In one embodiment, a one-touch dubbing function is also provided, and a user can generate corresponding comment audio from text. For example, according to user preferences, a male, female, or child sound is selected and a volume size is configured to produce the narrative audio. The interpreted audio may be combined with image video material, text to form a narrative-type video. According to the embodiment of the invention, when the video is generated, an automatic calibration function is also provided, when the video is inconsistent with the text time length (for example, the playing time length of the explanation audio corresponding to the text), the time axis is automatically calibrated by taking the content with longer time length as a reference, and the operation steps of a user are simplified.

Fig. 7 shows a schematic flow chart of a method 700 for video authoring on a mobile terminal in accordance with an embodiment of the present invention.

The method 700 includes: at step 710, displaying a list of icons associated with image material in a first area of a screen of the mobile terminal and displaying a list of text in a second area of the screen;

at step 720, at least one of the icons in the icon list and the text in the text list is adjusted to set the correspondence between the image material and the text; and

in step 730, a video including the text as subtitle information is generated based on the set correspondence of the image material and the text.

FIG. 8 shows a schematic block diagram of an apparatus 800 for video authoring on a mobile terminal in accordance with an embodiment of the invention.

Apparatus 800 for video authoring on a mobile terminal, comprising: a display unit 810 for displaying a list of icons related to image materials in a first area of a screen of the mobile terminal and displaying a list of texts in a second area of the screen;

an adjusting unit 820, configured to adjust at least one of an icon in the icon list and a text in the text list, so as to set a correspondence between image materials and the text; and

the video generating unit 830 is configured to generate a video including a text as subtitle information based on a correspondence between the set image material and the text.

In one possible embodiment, the adjustment unit 820 may also be used to: and moving the positions of the icons and/or the texts so as to set the corresponding relation between the image materials and the texts.

In one possible embodiment, the adjustment unit 820 may also be used to: and selecting the icon and/or the text, and editing the corresponding image material and text.

In one possible embodiment, the video generating unit 830 may be configured to: providing speech associated with the text; based on the time axis of the voice and the time axis of the image material, the time axis is calibrated and a video including the voice and the image material is generated based on the content having a longer duration.

In one possible embodiment, the apparatus 800 may include an initialization unit (not shown in the figures). The initialization unit is used for: acquiring image-text content comprising image materials and text content; segmenting the text content and forming an initial corresponding relation between the image material and a paragraph of the text content; and forming the text list by paragraph clauses of the text content. Thus, the display unit 810 may be configured to display an icon list related to the image material in a first area of a screen of the mobile terminal and display the text list in a second area of the screen according to the image material, the text list, and the initial correspondence. The initializing unit 820 may be further configured to select and import image materials corresponding to the text in the text list from a materials library; and selecting the text in the text list, and calculating the image material related to the semantics according to the semantics of the text as the image material corresponding to the text.

Based on the detailed description of the invention, it can be seen that the invention breaks through the information organization form of the traditional video creation tool, uses the characters as the core to longitudinally arrange the information, and uses more space to display the characters, so that the user can conveniently browse the whole, and the browsing mode is more in accordance with the traditional reading habit.

The text editing function is provided more conveniently, the input of single sentence text is more original and biochemical, the text editing function is consistent with the traditional text input mode, and the learning cost is reduced; the function of editing the characters in batches is also provided, so that the more complex character editing requirement is met; meanwhile, the function of one-key dubbing is provided, the dubbing is generated by voice synthesis, and the user dubbing time is saved.

On the basis of the traditional video editing function, a more macroscopic material editing function is provided, the corresponding relation between materials and characters is adjusted, the batch graphic matching is supported, and the use habit of creating videos by graphic authors is more met. The automatic calibration function is provided, when the video and character durations are inconsistent, the time axis is automatically calibrated by taking the longer-duration content as a reference, and the operation steps of a user are simplified. The automatic picture matching function is provided, and pictures related to matching are automatically calculated according to the semantics of the characters to match pictures, so that the step of searching materials by a user to match pictures is simplified.

The existing image-text content is converted into a video draft by one key, and intelligent segmentation and sentence segmentation are carried out; meanwhile, the video draft is generated by importing materials in batches, so that the time for importing the materials is saved.

Fig. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 900 includes a Central Processing Unit (CPU) 901 that can execute various appropriate actions and processes as shown in fig. 7 in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

Embodiments of the present invention provide a computer readable storage medium having stored thereon executable instructions that when executed by a processor cause the processor to perform any of the methods shown in fig. 7. By way of example, a computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device, such as a server, data center, or the like, that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely illustrative embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present invention, and the invention should be covered. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method for video authoring on a mobile terminal, comprising:

displaying an icon list related to the image material in a first area of a screen of the mobile terminal and a text list in a second area of the screen, and displaying a video preview in an upper portion of the screen;

at least one of icons in the icon list and texts in the text list is adjusted to set the corresponding relation between the image materials and the texts; the icon is provided with a control used for adjusting the size, and the size of the icon is adjusted through the control so as to set the corresponding relation between the image material and the text; and

based on the set correspondence of the image material and the text, a video including the text as subtitle information is generated.

2. The method of claim 1, wherein the first region and the second region are located on left and right sides of the screen, and icons in the icon list and text in the text list are displayed in a top-down layout.

3. The method of claim 2, wherein the setting of the correspondence of image material to text comprises: and setting the corresponding relation between the image material and the text based on the horizontal position relation between the icon of the image material and the text.

4. A method according to any one of claims 1-3, the method further comprising: and moving the positions of the icons and/or the texts so as to set the corresponding relation between the image materials and the texts.

5. A method according to any one of claims 1-3, the method further comprising: and selecting the icon and/or the text, and editing the corresponding image material and text.

6. The method according to claim 1, wherein the generating video, in particular, comprises: providing speech associated with the text; based on the time axis of the voice and the time axis of the image material, the time axis is calibrated and a video including the voice and the image material is generated based on the content having a longer duration.

7. The method of claim 1, wherein the image material is any one of a still picture, a moving picture, and a video image.

8. The method of claim 1, the method further comprising:

acquiring image-text content comprising image materials and text content;

segmenting the text content and forming an initial corresponding relation between the image material and a paragraph of the text content;

forming a paragraph sentence of the text content into the text list; and

and displaying an icon list related to the image material in a first area of a screen of the mobile terminal and displaying the text list in a second area of the screen according to the image material, the text list and the initial corresponding relation.

9. The method of claim 1, the method further comprising: image material corresponding to text in the text list is selected and imported from a material library.

10. The method of claim 1, the method further comprising: and selecting the text in the text list, and calculating the image material related to the semantics according to the semantics of the text as the image material corresponding to the text.

11. An apparatus for video authoring on a mobile terminal, comprising:

a display unit for displaying an icon list related to the image material in a first area of a screen of the mobile terminal and a text list in a second area of the screen, and displaying a video preview in an upper portion of the screen;

an adjusting unit, configured to adjust at least one of an icon in the icon list and a text in the text list, so as to set a correspondence between an image material and the text; the icon is provided with a control used for adjusting the size, and the size of the icon is adjusted through the control so as to set the corresponding relation between the image material and the text; and

and a video generation unit for generating a video including the text as subtitle information based on the set correspondence between the image material and the text.

12. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1-11 when the program is executed.

13. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-11.