CN112004138A

CN112004138A - Intelligent video material searching and matching method and device

Info

Publication number: CN112004138A
Application number: CN202010906507.9A
Authority: CN
Inventors: 郝晓伟; 詹丽; 林子杰; 郑章旭
Original assignee: Tianmai Juyuan Hangzhou Media Technology Co ltd
Current assignee: Beijing Lajin Zhongbo Technology Co ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2020-11-27

Abstract

The invention discloses a method and a device for searching and matching intelligent video materials, wherein the method comprises the following steps: acquiring multimedia information, and determining whether the multimedia information contains video text information and video material information; when the multimedia information does not contain video material information, displaying a video material editing frame in a first display area of a display interface, and displaying video text information in a second display area of the display interface; receiving click operation on a video material edit box, and outputting a first search window according to the click operation, wherein the first search window comprises at least one alternative keyword, and the alternative keyword is a keyword of video text information corresponding to the alternative keyword; receiving a selection command of at least one alternative keyword, and determining the selected alternative keyword as a target keyword; searching video material information corresponding to each target keyword, and displaying all the video material information in a first display area; and generating a target video according to all the video material information and the video text information.

Description

Intelligent video material searching and matching method and device

Technical Field

The invention relates to the technical field of video processing, in particular to a method and a device for searching and matching intelligent video materials.

Background

When the continuous image changes more than 24 frames per second, human eyes cannot distinguish a single static image according to the persistence of vision principle, and the static image looks smooth and continuous, so that the continuous image is called a video. At present, the application of video clipping technology is becoming more and more widespread, and clipping personnel usually adopt professional clipping software to clip videos so as to obtain video contents desired by users.

However, the existing video clipping technology mainly adopts a manual mode to intercept a required video segment from an original video (i.e. an existing video) and then performs a splicing process, which results in a large amount of time consumption in operation, and the video clipping operation is very tedious, which results in a slow video clipping speed, thereby reducing the video clipping efficiency.

Disclosure of Invention

In view of the above problems, the present invention provides a method and an apparatus for searching and matching an intelligent video material, which can quickly search a desired video material for a user according to text information, and then quickly generate a video desired by the user.

According to a first aspect of embodiments of the present invention, there is provided a method for intelligent video material search matching, the method comprising:

acquiring multimedia information, and determining whether the multimedia information contains video text information and video material information;

when the multimedia information does not contain video material information, displaying a video material editing frame in a first display area of a display interface, and displaying the video text information in a second display area of the display interface;

receiving click operation on the video material edit box, and outputting a first search window according to the click operation, wherein the first search window comprises at least one alternative keyword, and the alternative keyword is a keyword of video text information corresponding to the alternative keyword;

receiving a selection command of at least one alternative keyword, and determining the selected alternative keyword as a target keyword;

searching video material information corresponding to each target keyword, and displaying all the video material information in the first display area;

and generating a target video according to the all video material information and the video text information.

In one embodiment, preferably, generating the target video according to all the video material information and the video text information includes:

calculating the total duration required for voice broadcasting of the video text information according to a preset voice playing speed, and converting the video text information into voice information;

determining the average time length corresponding to each video material according to the total time length and the total number of the video material information;

setting the playing time length of each video material information according to the average time length corresponding to each video material to obtain the processed video information;

and generating a target video according to the video information and the voice information, wherein the duration of the target video is the total duration, the video content of the target video is the video information, and the audio content of the target video is the voice information.

In one embodiment, preferably, the setting the playing time length of each video material information according to the average time length corresponding to each video material information includes:

and when the duration of the video material information is longer than the average duration, capturing the video of the complete shot corresponding to the target keyword of the video material information according to the average duration.

In one embodiment, preferably, before generating the target video according to all the video material information and the video text information, the method further comprises:

receiving segmentation operation on the video text information, performing paragraph division on the video text information according to the segmentation operation, deleting and caching other video material information except the first video material information in the first display area, and only keeping a corresponding video material editing frame;

receiving click operation on a target video material editing frame, and outputting a second search window according to the click operation, wherein the deleted and cached information of other video materials is displayed in the second search window;

and receiving selection operation of target video material information in the other video material information, and displaying the selected target video material information at a display position corresponding to the target video material editing frame in the first display area.

In one embodiment, preferably, generating the target video according to the video information and the voice information further includes:

receiving an effect selection command input by a user, wherein the effect selection command comprises any one or more of the following items: selecting playing tone, selecting video background, decorating video pictures, self-defining corner marks, self-defining titles and self-defining trailers;

determining a target effect corresponding to the target video according to the effect selection command;

and generating a target video according to the target effect, the video information and the voice information.

According to a second aspect of embodiments of the present invention, there is provided an apparatus for intelligent video asset search matching, the apparatus comprising:

the first determining module is used for acquiring multimedia information and determining whether the multimedia information contains video text information and video material information;

the first display module is used for displaying a video material editing frame in a first display area of a display interface and displaying the video text information in a second display area of the display interface when the multimedia information does not contain the video material information;

the first output module is used for receiving click operation on the video material edit box and outputting a first search window according to the click operation, wherein the first search window comprises at least one alternative keyword, and the alternative keyword is a keyword of video text information corresponding to the alternative keyword;

the second determining module is used for receiving a selection command of at least one alternative keyword and determining the selected alternative keyword as a target keyword;

the searching module is used for searching the video material information corresponding to each target keyword and displaying all the video material information in the first display area;

and the generating module is used for generating a target video according to all the video material information and the video text information.

In one embodiment, preferably, the generating module includes:

the computing unit is used for computing the total duration required by voice broadcasting of the video text information according to a preset voice playing speed and converting the video text information into voice information;

the first determining unit is used for determining the average time length corresponding to each video material according to the total time length and the total number of the video material information;

the setting unit is used for setting the playing time length of each piece of video material information according to the average time length corresponding to each piece of video material so as to obtain the processed video information;

and the first generating unit is used for generating a target video according to the video information and the voice information, wherein the duration of the target video is the total duration, the video content of the target video is the video information, and the audio content of the target video is the voice information.

In one embodiment, preferably, the setting unit is configured to:

In one embodiment, preferably, the apparatus further comprises:

the segmentation module is used for receiving segmentation operation on the video text information, performing paragraph division on the video text information according to the segmentation operation, deleting and caching other video material information except the first video material information in the first display area, and only keeping a corresponding video material edit box;

the second output module is used for receiving the click operation of the target video material edit box and outputting a second search window according to the click operation, and the deleted and cached information of other video materials is displayed in the second search window;

and the second display module is used for receiving the selection operation of the target video material information in the other video material information and displaying the selected target video material information at the display position corresponding to the target video material editing frame in the first display area.

In one embodiment, preferably, the generating module further includes:

the device comprises a receiving unit and a control unit, wherein the receiving unit is used for receiving an effect selection command input by a user, and the effect selection command comprises any one or more of the following items: selecting playing tone, selecting video background, decorating video pictures, self-defining corner marks, self-defining titles and self-defining trailers;

the second determining unit is used for determining a target effect corresponding to the target video according to the effect selection command;

and the second generating unit is used for generating a target video according to the target effect, the video information and the voice information.

In the embodiment of the invention, when only the text information is available and the video material is unavailable, the keywords of the text information can be determined, so that when a user searches the video, the keywords are directly displayed in the search window, and the user can directly click the target keywords to search the video material corresponding to the target keywords, therefore, the user does not need to input information to search the video material by himself, the video material required by the user is quickly searched, and the target video required by the user is quickly generated according to the video material and the text information.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flow diagram of a method for searching for matches for intelligent video material, in accordance with an embodiment of the present invention.

FIG. 2 is a schematic diagram of a display interface of one embodiment of the present invention.

Fig. 3 is a flowchart of step S106 of a method for searching for matches of intelligent video material according to an embodiment of the present invention.

Fig. 4 is a flow chart of yet another method of intelligent video feed search matching in accordance with an embodiment of the present invention.

Fig. 5A is a flow chart of yet another method for intelligent video feed search matching in accordance with an embodiment of the present invention.

FIG. 5B is a schematic view of an effect selection interface, according to one embodiment of the invention.

Fig. 6 is a block diagram of an apparatus for searching for matches for intelligent video material according to an embodiment of the present invention.

Fig. 7 is a block diagram of a generation module in an apparatus for searching for matches for intelligent video material according to an embodiment of the present invention.

Fig. 8 is a block diagram of an apparatus for intelligent video feed search matching in accordance with an embodiment of the present invention.

Fig. 9 is a block diagram of a generation module in an apparatus for searching for matches for intelligent video material according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a method for searching and matching intelligent video materials according to an embodiment of the present invention, and as shown in fig. 1, the method for searching and matching intelligent video materials includes:

step S101, acquiring multimedia information, and determining whether the multimedia information contains video text information and video material information;

step S102, when the multimedia information does not contain video material information, displaying a video material editing frame in a first display area of a display interface, and displaying the video text information in a second display area of the display interface. Specifically, as shown in fig. 2, a video material editing box may be displayed in a left display area of the display interface, video text information may be displayed in a right display area of the display interface, and meanwhile, an editing button may be displayed below the video text information, and a user may edit the video text by clicking the editing text button.

Step S103, receiving a click operation on the video material edit box, and outputting a first search window according to the click operation, wherein the first search window comprises at least one alternative keyword, and the alternative keyword is a keyword of video text information corresponding to the alternative keyword.

Step S104, receiving a selection command of at least one alternative keyword, and determining the selected alternative keyword as a target keyword;

step S105, searching video material information corresponding to each target keyword, and displaying all the video material information in the first display area;

and step S106, generating a target video according to all the video material information and the video text information. If the text content of the video text information is too long, the abstract content can be extracted to shorten the text.

In the embodiment, when only the text information is available and the video material is unavailable, the keywords of the text information can be determined, so that when a user searches the video, the keywords are directly displayed in the search window, the user can directly click the target keywords to search the video material corresponding to the target keywords, and therefore the user does not need to input information to search the video material, the video material required by the user is quickly searched, and the target video required by the user is quickly generated according to the video material and the text information.

As shown in fig. 3, in one embodiment, preferably, the step S106 includes:

step S301, calculating the total duration required for voice broadcasting of the video text information according to a preset voice playing speed, and converting the video text information into voice information;

step S302, determining the average time length corresponding to each video material according to the total time length and the total number of the video material information; the video material information may be a video or a picture.

Step S303, setting the playing time length of each video material information according to the average time length corresponding to each video material to obtain the processed video information;

in this embodiment, the total duration of the voice broadcast may be averaged over each video and picture according to the total number of the videos and pictures, the videos and the pictures are synthesized into one video according to the averaged duration, and the voice obtained by converting the text is used as the audio in the synthesized video. If the video material is a picture, the picture can be copied into a plurality of video frames with corresponding number according to the playing time length corresponding to the picture.

Step S304, generating a target video according to the video information and the voice information, wherein the duration of the target video is the total duration, the video content of the target video is the video information, and the audio content of the target video is the voice information.

In the embodiment, the video material and the video text are obtained, the text is converted into the audio, the video material is used as the video, and then the target video is automatically generated according to the video material and the audio, so that the user does not need to manually intercept the video and record the audio, the user operation is reduced, and the video production experience of the user is improved.

and when the duration of the video material information is longer than the average duration, capturing the video of the complete shot corresponding to the target keyword of the video material information according to the average duration. For example, if the keyword is a person's name, the captured video is a complete shot containing the person's avatar. For another example, if the keyword is hawthorn, the hawthorn may be a name of a person or a food, the search is performed in a classified manner, the search results are also displayed in a classified manner, one page displays the video of the person with hawthorn, and one page displays the video of the food with hawthorn.

As shown in fig. 4, in one embodiment, preferably, before step S106, the method further includes:

step S401, receiving a segmentation operation on the video text information, performing paragraph division on the video text information according to the segmentation operation, deleting and caching other video material information except the first video material information in the first display area, and only keeping a corresponding video material edit box;

step S402, receiving click operation on a target video material edit box, and outputting a second search window according to the click operation, wherein the deleted and cached information of other video materials is displayed in the second search window;

step S403, receiving a selection operation on the target video material information in the other video material information, and displaying the selected target video material information at a display position corresponding to the target video material edit box in the first display area.

As shown in fig. 2, when editing text, the user clicks below the text content to the edit button, and the text content area enters the edit mode. Meanwhile, if a plurality of videos or pictures exist on the left side, the first picture or video is reserved, and other pictures and videos are deleted and cached. The user can segment the text content, the video material corresponding to the segmented paragraph text is blank, the user can click the edit button of the video material area to pop up a search window, and the deleted and cached video and picture are displayed in the search window for the user to select.

As shown in fig. 5A, in an embodiment, preferably, the step S106 further includes:

step S501, receiving an effect selection command input by a user, wherein the effect selection command comprises any one or more of the following items: selecting playing tone, selecting video background, decorating video pictures, self-defining corner marks, self-defining titles and self-defining trailers;

step S502, determining a target effect corresponding to the target video according to the effect selection command;

step S503, generating a target video according to the target effect, the video information and the voice information.

In this embodiment, the user is also provided with other edit boxes of the synthesized video, such as adding playing timbre, background music, etc. to the synthesized video, and the user can also input the title. Specifically, as shown in fig. 5B, the user may directly input a title in the title box, and may also select to play a tone color selection, a video background selection, a video picture decoration, a custom logo, a custom leader, a custom trailer, and the like.

Of course, if the user does not want the automatically generated audio content, the user can record the audio content by himself, so that the user can set the option of self-defining the audio for the user to select.

The implementation process of the intelligent video material search matching is introduced through the above description, and the process can be implemented by a device, and the internal structure and functions of the device are introduced below.

As shown in fig. 6, there is provided an apparatus for intelligent video material search matching, the apparatus comprising:

the first determining module 61 is configured to obtain multimedia information, and determine whether the multimedia information includes video text information and video material information;

the first display module 62 is configured to, when the multimedia information does not include video material information, display a video material edit box in a first display area of a display interface, and display the video text information in a second display area of the display interface;

a first output module 63, configured to receive a click operation on the video material edit box, and output a first search window according to the click operation, where the first search window includes at least one alternative keyword, and the alternative keyword is a keyword of video text information corresponding to the alternative keyword;

a second determining module 64, configured to receive a selection command for at least one candidate keyword, and determine the selected candidate keyword as a target keyword;

a search module 65, configured to search for video material information corresponding to each target keyword, and display all the video material information in the first display area;

and a generating module 66, configured to generate a target video according to the all video material information and the video text information.

As shown in fig. 7, in one embodiment, preferably, the generating module 66 includes:

the calculating unit 71 is configured to calculate a total duration required for voice broadcasting of the video text information according to a preset voice playing speed, and convert the video text information into voice information;

a first determining unit 72, configured to determine an average duration corresponding to each video material according to the total duration and the total number of the video material information;

the setting unit 73 is configured to set the playing time length of each piece of video material information according to the average time length corresponding to each piece of video material, so as to obtain processed video information;

a first generating unit 74, configured to generate a target video according to the video information and the voice information, where a duration of the target video is the total duration, a video content of the target video is the video information, and an audio content of the target video is the voice information.

In one embodiment, preferably, the setting unit 73 is configured to:

As shown in fig. 8, in one embodiment, preferably, the apparatus further comprises:

the segmenting module 81 is configured to receive a segmenting operation on the video text information, perform paragraph division on the video text information according to the segmenting operation, delete and cache other video material information except for the first video material information in the first display area, and only reserve a corresponding video material edit box;

a second output module 82, configured to receive a click operation on a target video material edit box, and output a second search window according to the click operation, where the deleted and cached information of the other video materials is displayed in the second search window;

and the second display module 83 is configured to receive a selection operation on target video material information in the other video material information, and display the selected target video material information at a display position corresponding to the target video material edit box in the first display area.

As shown in fig. 9, in one embodiment, preferably, the generating module 66 further includes:

a receiving unit 91, configured to receive an effect selection command input by a user, where the effect selection command includes any one or more of the following: selecting playing tone, selecting video background, decorating video pictures, self-defining corner marks, self-defining titles and self-defining trailers;

a second determining unit 92, configured to determine a target effect corresponding to the target video according to the effect selection command;

a second generating unit 93, configured to generate a target video according to the target effect, the video information, and the voice information.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for intelligent video asset search matching, the method comprising:

2. The method of claim 1, wherein generating a target video from the all video material information and the video text information comprises:

3. The method of claim 2, wherein setting the playing time duration of each video material information according to the average time duration corresponding to each video material information comprises:

4. The method of claim 1, wherein prior to generating a target video from the all video material information and the video text information, the method further comprises:

5. The method of claim 1, wherein generating a target video from the video information and the voice information further comprises:

6. An apparatus for intelligent video asset search matching, the apparatus comprising:

7. The apparatus of claim 6, wherein the generating module comprises:

8. The apparatus of claim 7, wherein the setting unit is configured to:

9. The apparatus of claim 6, further comprising:

10. The apparatus of claim 6, wherein the generating module further comprises:

11. An apparatus for intelligent video asset search matching, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

12. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the steps of the method of any one of claims 1 to 5.