CN112004137A

CN112004137A - Intelligent video creation method and device

Info

Publication number: CN112004137A
Application number: CN202010905402.1A
Authority: CN
Inventors: 郝晓伟; 詹丽; 林子杰; 郑章旭
Original assignee: Tianmai Juyuan Hangzhou Media Technology Co ltd
Current assignee: Beijing Lajin Zhongbo Technology Co ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2020-11-27

Abstract

The invention discloses an intelligent video creation method and device, wherein the method comprises the following steps: receiving a website link input by a user, and acquiring multimedia information corresponding to the website link, wherein the multimedia information comprises video material information and video text information; calculating the total duration required for voice broadcasting of the video text information according to a preset voice playing speed, and converting the video text information into voice information; processing the video material information according to the total duration to obtain processed video information; and generating a target video according to the video information and the voice information, wherein the duration of the target video is the total duration, the video content of the target video is the video information, and the audio content of the target video is the voice information. By the technical scheme, the video can be automatically generated according to the link input by the user, so that the operation of the user is reduced, and the use experience of the user is improved.

Description

Intelligent video creation method and device

Technical Field

The invention relates to the technical field of video processing, in particular to an intelligent video creation method and device.

Background

When the continuous image changes more than 24 frames per second, human eyes cannot distinguish a single static image according to the persistence of vision principle, and the static image looks smooth and continuous, so that the continuous image is called a video. At present, the application of video clipping technology is becoming more and more widespread, and clipping personnel usually adopt professional clipping software to clip videos so as to obtain video contents desired by users.

However, the existing video clipping technology mainly adopts a manual mode to intercept a required video segment from an original video (i.e. an existing video) and then performs a splicing process, which results in a large amount of time consumption in operation, and the video clipping operation is very tedious, which results in a slow video clipping speed, thereby reducing the video clipping efficiency.

Disclosure of Invention

In view of the above problems, the present invention provides an intelligent video authoring method and a corresponding apparatus, which can automatically generate a video according to a link input by a user, thereby reducing user operations and improving user experience.

According to a first aspect of the embodiments of the present invention, there is provided an intelligent video authoring method, the method including:

receiving a website link input by a user, and acquiring multimedia information corresponding to the website link, wherein the multimedia information comprises video material information and video text information;

calculating the total duration required for voice broadcasting of the video text information according to a preset voice playing speed, and converting the video text information into voice information;

processing the video material information according to the total duration to obtain processed video information;

and generating a target video according to the video information and the voice information, wherein the duration of the target video is the total duration, the video content of the target video is the video information, and the audio content of the target video is the voice information.

In one embodiment, preferably, the processing the video material information according to the total duration to obtain processed video information includes:

when the video material information comprises a plurality of video materials, calculating the average time length of each video material according to the total number of the video materials;

setting the playing time length of each video material according to the average time length to obtain a plurality of processed video materials;

synthesizing the processed video materials into the video information;

and when the video material information comprises a video material, setting the playing time length of the video material according to the total time length.

In one embodiment, preferably, the method further comprises:

displaying the multimedia information in a classified manner on a display interface, wherein the video material information is displayed in a first display area of the display interface, and the video text information is displayed in a second display area of the display interface;

in one embodiment, preferably, before calculating a total time required for voice broadcasting the video text information according to a preset voice broadcasting speed, the method further includes:

receiving an editing command for the video text information input by a user;

editing the video text information according to the editing command, wherein the editing operation comprises the operations of modifying, deleting and adding the content of the video text information;

in one embodiment, preferably, the acquiring the multimedia information corresponding to the website link includes:

acquiring video text information corresponding to the website link;

determining whether the multimedia information contains video material information;

when the multimedia information contains the video material information, directly acquiring the video material information;

when the multimedia information does not contain the video material information, analyzing the video text information to obtain a keyword corresponding to the video text information;

and acquiring video material information matched with the video text information from a preset video material library according to the key words.

In one embodiment, preferably, generating the target video according to the video information and the voice information comprises:

receiving an effect selection command input by a user, wherein the effect selection command comprises any one or more of the following items: selecting playing tone, selecting video background, decorating video pictures, self-defining corner marks, self-defining titles and self-defining trailers;

determining a target effect corresponding to the target video according to the effect selection command;

and generating a target video according to the target effect, the video information and the voice information.

According to a second aspect of the embodiments of the present invention, there is provided an intelligent video authoring apparatus, the apparatus comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for receiving a website link input by a user and acquiring multimedia information corresponding to the website link, and the multimedia information comprises video material information and video text information;

the calculation module is used for calculating the total duration required by voice broadcasting of the video text information according to a preset voice playing speed and converting the video text information into voice information;

the processing module is used for processing the video material information according to the total duration to obtain processed video information;

and the generating module is used for generating a target video according to the video information and the voice information, wherein the duration of the target video is the total duration, the video content of the target video is the video information, and the audio content of the target video is the voice information.

In one embodiment, preferably, the processing module includes:

the calculating unit is used for calculating the average duration of each video material according to the total number of the video materials when the video material information comprises a plurality of video materials;

the first setting unit is used for setting the playing time length of each video material according to the average time length so as to obtain a plurality of processed video materials;

the synthesizing unit is used for synthesizing the processed video materials into the video information;

and the second setting unit is used for setting the playing time length of the video material according to the total time length when the video material information comprises the video material.

In one embodiment, preferably, the apparatus further comprises:

the display module is used for displaying the multimedia information in a classified manner on a display interface, wherein the video material information is displayed in a first display area of the display interface, and the video text information is displayed in a second display area of the display interface;

in one embodiment, preferably, the apparatus further comprises:

the receiving module is used for receiving an editing command of the video text information, which is input by a user;

the editing module is used for editing the video text information according to the editing command, wherein the editing operation comprises the operations of modifying, deleting and adding the content of the video text information;

in one embodiment, preferably, the obtaining module includes:

the first acquisition unit is used for acquiring video text information corresponding to the website link;

the first determining unit is used for determining whether the multimedia information contains video material information;

the second acquisition unit is used for directly acquiring the video material information when the multimedia information contains the video material information;

the analysis unit is used for analyzing the video text information to obtain a keyword corresponding to the video text information when the multimedia information does not contain the video material information;

and the matching unit is used for acquiring the video material information matched with the video text information from a preset video material library according to the keywords.

In one embodiment, preferably, the generating module includes:

the device comprises a receiving unit and a control unit, wherein the receiving unit is used for receiving an effect selection command input by a user, and the effect selection command comprises any one or more of the following items: selecting playing tone, selecting video background, decorating video pictures, self-defining corner marks, self-defining titles and self-defining trailers;

the second determining unit is used for determining a target effect corresponding to the target video according to the effect selection command;

and the generating unit is used for generating a target video according to the target effect, the video information and the voice information.

According to a third aspect of the embodiments of the present invention, there is provided an intelligent video authoring apparatus, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any one of the first aspects.

In the embodiment of the invention, the video material and the video text can be obtained from the website link input by the user, the text is converted into the audio, and then the target video is automatically generated according to the video material and the audio, so that the user does not need to manually intercept the video and record the audio, thereby reducing the user operation and improving the video production experience of the user.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of an intelligent video authoring method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method of step S103 in fast video generation according to an embodiment of the present invention.

Fig. 3 is a flowchart of another intelligent video authoring method of an embodiment of the present invention.

FIG. 4 is a schematic diagram of a display interface of one embodiment of the present invention.

Fig. 5 is a flowchart of another intelligent video authoring method of one embodiment of the present invention.

Fig. 6 is a flowchart of another intelligent video authoring method of one embodiment of the present invention.

Fig. 7 is a flowchart of another intelligent video authoring method of one embodiment of the present invention.

FIG. 8 is a schematic view of an effects selection interface according to one embodiment of the invention.

Fig. 9 is a block diagram of an intelligent video authoring apparatus of one embodiment of the present invention.

Fig. 10 is a block diagram of processing modules in an intelligent video authoring apparatus in accordance with an embodiment of the present invention.

Fig. 11 is a block diagram of another intelligent video authoring apparatus of one embodiment of the present invention.

Fig. 12 is a block diagram of another intelligent video authoring apparatus of one embodiment of the present invention.

Fig. 13 is a block diagram of an acquisition module in an intelligent video authoring apparatus according to an embodiment of the present invention.

Fig. 14 is a block diagram of a generation module in an intelligent video authoring apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of an intelligent video creation method according to an embodiment of the present invention, and as shown in fig. 1, the intelligent video creation method includes:

step S101, receiving a website link input by a user, and acquiring multimedia information corresponding to the website link, wherein the multimedia information comprises video material information and video text information. The video material information may be a video or a picture.

Step S102, calculating the total duration required for voice broadcasting of the video text information according to a preset voice playing speed, and converting the video text information into voice information;

step S103, processing the video material information according to the total duration to obtain processed video information;

and step S104, generating a target video according to the video information and the voice information, wherein the duration of the target video is the total duration, the video content of the target video is the video information, and the audio content of the target video is the voice information.

In the embodiment, the video material and the video text can be obtained from the website link input by the user, the text is converted into the audio, and the target video is automatically generated according to the video material and the audio, so that the user does not need to manually intercept the video and record the audio, the user operation is reduced, and the video production experience of the user is improved.

As shown in fig. 2, in one embodiment, preferably, the step S103 includes:

step S201, when the video material information comprises a plurality of video materials, calculating the average time length of each video material according to the total number of the plurality of video materials;

step S202, setting the playing time length of each video material according to the average time length to obtain a plurality of processed video materials; if the video exceeds the voice broadcast time length, intercepting the video according to the voice broadcast time length.

Step S203, synthesizing the processed video materials into the video information;

and step S204, when the video material information comprises a video material, setting the playing time length of the video material according to the total time length.

In this embodiment, the total duration of the voice broadcast may be averaged over each video and picture according to the total number of the videos and pictures, the videos and the pictures are synthesized into one video according to the averaged duration, and the voice obtained by converting the text is used as the audio in the synthesized video. If the video material is a picture, the picture can be copied into a plurality of video frames with corresponding number according to the playing time length corresponding to the picture.

As shown in fig. 3, in one embodiment, preferably, the method further includes:

step S301, displaying the multimedia information in a classified manner on a display interface, wherein the video material information is displayed in a first display area of the display interface, and the video text information is displayed in a second display area of the display interface.

For example, as shown in fig. 4, the video material information may be displayed in the left display area of the display interface, the video text information may be displayed in the right display area of the display interface, meanwhile, the editing button may be displayed below both the video material information and the video text information, the user may edit the video or text by clicking the editing button, and the user may also click the add material button to add the video material.

As shown in fig. 5, in an embodiment, preferably before step S102, the method further includes:

step S501, receiving an editing command for the video text information input by a user;

step S502, editing the video text information according to the editing command, wherein the editing operation comprises the operations of modifying, deleting and adding the content of the video text information.

In this embodiment, the user can edit the text content in the video text information, such as modifying, adding, deleting some words, and so on, so that the text content better meets the requirements of the user.

As shown in fig. 6, in one embodiment, preferably, the step S101 includes:

step 601, acquiring video text information corresponding to the website link. If the text content is too long, abstract content extraction can be performed to shorten the text.

Step S602, determining whether the multimedia information contains video material information;

step S603, when the multimedia information contains the video material information, directly acquiring the video material information;

step S604, when the multimedia information does not contain the video material information, analyzing the video text information to obtain a keyword corresponding to the video text information;

and step S605, acquiring video material information matched with the video text information from a preset video material library according to the key words.

In this embodiment, the multimedia information obtained from the website link may only include video text information but not video material information, and at this time, the video text information may be analyzed to obtain a keyword corresponding to the video text information, and according to the keyword, video material information matched with the video text information is obtained from a preset video material library. Therefore, the user does not need to manually search for the video materials, the video materials can be directly matched according to the keywords of the text, and the operation of the user is further reduced.

As shown in fig. 7, in one embodiment, preferably, the step S104 includes:

step 701, receiving an effect selection command input by a user, wherein the effect selection command includes any one or more of the following: selecting playing tone, selecting video background, decorating video pictures, self-defining corner marks, self-defining titles and self-defining trailers;

step 702, determining a target effect corresponding to the target video according to the effect selection command;

and 703, generating a target video according to the target effect, the video information and the voice information.

In this embodiment, the user is also provided with other edit boxes of the synthesized video, such as adding playing timbre, background music, etc. to the synthesized video, and the user can also input the title. Specifically, as shown in fig. 8, the user may directly input a title in the title box, and may also select to play a tone color selection, a video background selection, a video picture decoration, a custom logo, a custom leader, a custom trailer, and the like.

Of course, if the user does not want the automatically generated audio content, the user can record the audio content by himself, so that the user can set the option of self-defining the audio for the user to select.

The above description describes an intelligent video authoring implementation process that may be implemented by a device, the internal structure and function of which are described below.

As shown in fig. 9, there is provided an intelligent video authoring apparatus, the apparatus comprising:

the acquiring module 91 is configured to receive a website link input by a user, and acquire multimedia information corresponding to the website link, where the multimedia information includes video material information and video text information;

the calculation module 92 is configured to calculate a total duration required for voice broadcasting of the video text information according to a preset voice playing speed, and convert the video text information into voice information;

the processing module 93 is configured to process the video material information according to the total duration to obtain processed video information;

a generating module 94, configured to generate a target video according to the video information and the voice information, where a duration of the target video is the total duration, a video content of the target video is the video information, and an audio content of the target video is the voice information.

As shown in fig. 10, in one embodiment, the processing module 93 preferably includes:

a calculating unit 1001 configured to calculate an average duration of each video material according to a total number of the plurality of video materials when the video material information includes the plurality of video materials;

a first setting unit 1002, configured to set a playing time length of each video material according to the average time length, so as to obtain a plurality of processed video materials;

a synthesizing unit 1003, configured to synthesize the processed multiple video materials into the video information;

a second setting unit 1004, configured to set, when the video material information includes a video material, a playing time length of the video material according to the total time length.

As shown in fig. 11, in one embodiment, preferably, the apparatus further comprises:

the display module 1101 is configured to display the multimedia information in a classified manner on a display interface, where the video material information is displayed in a first display area of the display interface, and the video text information is displayed in a second display area of the display interface.

As shown in fig. 12, in one embodiment, preferably, the apparatus further comprises:

a receiving module 1201, configured to receive an editing command for the video text information, input by a user;

and the editing module 1202 is configured to perform an editing operation on the video text information according to the editing command, where the editing operation includes operations of modifying, deleting, and adding the content of the video text information.

As shown in fig. 13, in one embodiment, preferably, the obtaining module 91 includes:

a first obtaining unit 1301, configured to obtain video text information corresponding to the website link;

a first determining unit 1302, configured to determine whether the multimedia information includes video material information;

a second obtaining unit 1303, configured to directly obtain the video material information when the multimedia information includes the video material information;

an analyzing unit 1304, configured to analyze the video text information to obtain a keyword corresponding to the video text information when the multimedia information does not include the video material information;

the matching unit 1305 is configured to obtain, according to the keyword, video material information matched with the video text information from a preset video material library.

As shown in fig. 14, in one embodiment, preferably, the generating module 94 includes:

a receiving unit 1401, configured to receive an effect selection command input by a user, where the effect selection command includes any one or more of: selecting playing tone, selecting video background, decorating video pictures, self-defining corner marks, self-defining titles and self-defining trailers;

a second determining unit 1402, configured to determine a target effect corresponding to the target video according to the effect selection command;

a generating unit 1403, configured to generate a target video according to the target effect, the video information, and the voice information.

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An intelligent video authoring method, the method comprising:

2. The method of claim 1, wherein processing the video material information according to the total duration to obtain processed video information comprises:

synthesizing the processed video materials into the video information;

3. The method of claim 1, further comprising:

and displaying the multimedia information in a classified manner on a display interface, wherein the video material information is displayed in a first display area of the display interface, and the video text information is displayed in a second display area of the display interface.

4. The method of claim 3, wherein before calculating a total duration required to voice-report the video text information according to a preset voice playing speed, the method further comprises:

receiving an editing command for the video text information input by a user;

and editing the video text information according to the editing command, wherein the editing operation comprises the operations of modifying, deleting and adding the content of the video text information.

5. The method of claim 1, wherein obtaining the multimedia information corresponding to the website link comprises:

acquiring video text information corresponding to the website link;

6. The method of claim 1, wherein generating a target video from the video information and the voice information comprises:

7. An intelligent video creation apparatus, the apparatus comprising:

8. The apparatus of claim 7, wherein the processing module comprises:

9. The apparatus of claim 7, further comprising:

and the display module is used for displaying the multimedia information in a classified manner on a display interface, wherein the video material information is displayed in a first display area of the display interface, and the video text information is displayed in a second display area of the display interface.

10. The apparatus of claim 9, further comprising:

and the editing module is used for editing the video text information according to the editing command, wherein the editing operation comprises the operations of modifying, deleting and adding the content of the video text information.

11. The apparatus of claim 7, wherein the obtaining module comprises:

12. The apparatus of claim 7, wherein the generating module comprises:

13. An intelligent video creation device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

14. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the steps of the method of any one of claims 1 to 6.