CN112291612A

CN112291612A - Video and audio matching method and device, storage medium and electronic equipment

Info

Publication number: CN112291612A
Application number: CN202011085648.5A
Authority: CN
Inventors: 莫文
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2021-01-29
Anticipated expiration: 2040-10-12
Also published as: CN112291612B

Abstract

The disclosure provides a video and audio matching method and device, a storage medium and electronic equipment, and relates to the technical field of computers. The matching method of the video and the audio comprises the following steps: determining the variation degree between different image frames in a video to be matched to obtain the image variation degree of the video to be matched; determining the label of the video to be matched by identifying the label of the image frame in the video to be matched; matching the image variation degree of the video to be matched with the rhythm intensity of each audio in an audio library, matching the label of the video to be matched with the label of each audio, and determining one or more target matching audio matched with the video to be matched. The method and the device solve the problem of low efficiency of audio matching, and improve user experience.

Description

Video and audio matching method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for matching video and audio, a computer-readable storage medium, and an electronic device.

Background

When producing video, it is often necessary to match the video with the appropriate audio. For example, some video platforms allow users to upload videos produced by themselves and provide functionality to match audio to the videos.

In the related art, most of videos are produced or uploaded by users, some popular audios are recommended by a system, and the users select the recommended popular audios, so that the videos are limited in the popular audios range, and the background music diversity of the videos is low. When the user does not find a proper audio frequency in the hot audio frequency recommended by the system, the user needs to further select the audio frequency category and input the music name to search the related audio frequency, and the operation is complex, so that the efficiency of audio frequency matching is low.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure provides a method and an apparatus for matching video and audio, a computer-readable storage medium, and an electronic device, so as to solve at least to some extent the problems of strong limitation and low efficiency of audio matching in the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, there is provided a matching method of video and audio, including: determining the variation degree between different image frames in a video to be matched to obtain the image variation degree of the video to be matched; determining the label of the video to be matched by identifying the label of the image frame in the video to be matched; matching the image variation degree of the video to be matched with the rhythm intensity of each audio in an audio library, matching the label of the video to be matched with the label of each audio, and determining one or more target matching audio matched with the video to be matched.

In an exemplary embodiment of the present disclosure, the determining a degree of variation between different image frames in a video to be matched includes: determining the similarity between different image frames in the video to be matched; and determining the degree of change between the different image frames according to the similarity between the different image frames.

In an exemplary embodiment of the present disclosure, the determining a degree of variation between different image frames in a video to be matched includes: and determining the variation degree between two adjacent image frames in the video to be matched.

In an exemplary embodiment of the present disclosure, the determining a degree of change between two adjacent image frames in the video to be matched includes: and traversing the video to be matched, and determining the degree of change between each image frame and the last image frame.

In an exemplary embodiment of the present disclosure, the determining a degree of change between different image frames in a video to be matched to obtain an image degree of change of the video to be matched further includes: and averaging the change degree between each image frame and the previous image frame to obtain the image change degree of the video to be matched.

In an exemplary embodiment of the present disclosure, the matching the image variation degree of the video to be matched with the rhythm intensity of each audio in an audio library, matching the tag of the video to be matched with the tag of each audio, and determining one or more target matching audios matching the video to be matched includes: matching the image variation degree of the video to be matched with the rhythm intensity of each audio in an audio library, and determining pre-matched audio from the audio library; matching the labels of the videos to be matched with the labels of the pre-matched audios, and determining the target matched audio from the pre-matched audios.

In an exemplary embodiment of the present disclosure, the matching the image variation degree of the video to be matched with the tempo intensity of each audio in an audio library, and determining a pre-matching audio from the audio library includes: acquiring the rhythm intensity of each audio in the audio library; determining a first matching degree of each audio and the video to be matched according to the difference between the rhythm intensity of each audio and the image variation degree of the video to be matched; and determining the pre-matching audio according to the first matching degree of each audio and the video to be matched.

In an exemplary embodiment of the present disclosure, the determining the pre-matching audios according to the first matching degrees of the respective audios and the video to be matched includes: and selecting one or more audio frequencies with the highest first matching degree as the pre-matching audio frequencies.

In an exemplary embodiment of the disclosure, the selecting one or more of the audios with the highest first matching degree includes: and selecting a first preset number of audios with the highest first matching degree.

In an exemplary embodiment of the present disclosure, the matching the tag of the video to be matched with the tag corresponding to each audio in each pre-matching audio, and determining the target matching audio from the pre-matching audio includes: for any pre-matched audio, comparing the label of the any pre-matched audio with the label of the video to be matched, and forming a label matching list of the any pre-matched audio by using the same label; determining a second matching degree of any pre-matched audio and the video to be matched according to the tag matching list; and determining the target matching audio according to the second matching degree of each pre-matching audio and the video to be matched.

In an exemplary embodiment of the present disclosure, the determining a second matching degree between any one of the pre-matching audios and the video to be matched according to the tag matching list includes: and determining a second matching degree of any pre-matched audio and the video to be matched according to the general value of the category to which each label belongs in the label matching list.

In an exemplary embodiment of the present disclosure, the determining a target matching audio according to a second matching degree between each pre-matching audio and the video to be matched includes: and selecting one or more pre-matched audios with the highest second matching degree as the target matched audio.

In an exemplary embodiment of the disclosure, the selecting one or more pre-matching audios with the highest second matching degree includes: and selecting a second preset number of the pre-matched audios with the highest second matching degree.

In an exemplary embodiment of the present disclosure, when the number of the pre-match audios is less than the second predetermined number, the method further includes: determining all the pre-matched audios as the target matched audio; and determining the target matching audio as supplement from hot audio except the pre-matching audio, and enabling the number of the target matching audio after supplement to be the second preset number.

In an exemplary embodiment of the present disclosure, the method further comprises: and displaying the information of the target matching audio, so that the user selects a final matching audio.

According to a second aspect of the present disclosure, there is provided a video and audio matching apparatus, comprising: the image change degree determining module is used for determining the change degree between different image frames in the video to be matched to obtain the image change degree of the video to be matched; the video tag determining module is used for determining tags of the videos to be matched by identifying tags of the image frames in the videos to be matched; and the target audio determining module is used for matching the image change degree of the video to be matched with the rhythm intensity of each audio in an audio library, matching the label of the video to be matched with the label of each audio and determining one or more target matching audios matched with the video to be matched.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described video and audio matching method.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the video and audio matching method described above via execution of the executable instructions.

The technical scheme of the disclosure has the following beneficial effects:

in the matching process of the video and the audio, the variation degree between different image frames in the video to be matched is determined to obtain the image variation degree of the video to be matched, the label of the video to be matched is determined by identifying the label of the image frame in the video to be matched, the image variation degree of the video to be matched is matched with the rhythm intensity of each audio in an audio library, the label of the video to be matched is matched with the label of each audio, and one or more target matching audios matched with the video to be matched are determined. On one hand, the target matching audio is determined by combining the image change degree of the video to be matched with the label. The image change degree of the video to be matched is obtained, so that the image change degree of the video and the rhythm intensity of the selected audio can be matched with each other, for example, the video with larger image change degree is matched with the audio with high rhythm intensity; by identifying the labels of the image frames in the video to be matched, the theme of the video to be matched is more matched with the theme of the selected audio; therefore, the scheme enables the selected target matching audio to be more suitable for the video to be matched on the theme and rhythm, the requirement of the user on the audio is more met, the audio matching efficiency is improved, the audio matching degree can be improved, and the user experience is further improved. On the other hand, the selected target matching audio is not limited in hot audio, so that the target matching audio can be selected from the audio library according to the characteristics of the video to be matched, the audio selection range is wider, and the audio selection diversity is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is apparent that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings can be obtained from those drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flowchart of a matching method of video and audio in the present exemplary embodiment;

FIG. 2 illustrates a flow chart for determining a degree of change between different image frames in this exemplary embodiment;

FIG. 3 illustrates a flow chart for determining target matching audio in this exemplary embodiment;

FIG. 4 illustrates a flow chart for determining pre-match audio in the exemplary embodiment;

FIG. 5 illustrates a flow chart for determining a target matching audio from pre-matching audio in the exemplary embodiment;

FIG. 6 shows a flowchart of one method of selecting a second predetermined number of pre-matched audio frequencies as target matched audio frequencies in the exemplary embodiment;

FIG. 7 is a system block diagram illustrating matching of video and audio in this exemplary embodiment;

fig. 8 is a block diagram showing a configuration of a video and audio matching apparatus in the present exemplary embodiment;

fig. 9 shows an electronic device for implementing the above method in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Herein, "first", "second", etc. are labels for specific objects, and do not limit the number or order of the objects.

In the related art, when audio is matched for video, the user can select the audio by recommending some popular audio to the user. However, the hot audio can be well matched with the video that the user wants to match, for example, the hot audio recommended to the user is usually the audio with a cheerful rhythm, and if the video uploaded by the user is a landscape shot with a relaxed picture content, it is obvious that the hot audio cannot be well matched with the video. In addition, when the popular audio cannot meet the user's requirement, the user generally selects the audio category to be matched from the audio categories by classification, and then selects a suitable audio from the selected audio categories for matching, or selects the audio that the user wants to be matched by searching the audio through a search box. The audio matching process is complex to operate and low in efficiency, and user experience can be influenced.

In view of one or more of the above problems, exemplary embodiments of the present disclosure provide a video and audio matching method.

Fig. 1 shows a schematic flow of a matching method of video and audio in the present exemplary embodiment, including the following steps S110 to S130:

step S110, determining the change degree between different image frames in the video to be matched to obtain the image change degree of the video to be matched;

step S120, determining the label of the video to be matched by identifying the label of the image frame in the video to be matched;

step S130, matching the image change degree of the video to be matched with the rhythm intensity of each audio in the audio library, matching the label of the video to be matched with the label of each audio, and determining one or more target matching audio matched with the video to be matched.

In the matching process of the video and the audio, the variation degree between different image frames in the video to be matched is determined to obtain the image variation degree of the video to be matched, the label of the video to be matched is determined by identifying the label of the image frame in the video to be matched, the image variation degree of the video to be matched is matched with the rhythm intensity of each audio in an audio library, the label of the video to be matched is matched with the label of each audio, and one or more target matching audios matched with the video to be matched are determined. On one hand, the target matching audio is determined by combining the image change degree of the video to be matched with the label. The image change degree of the video to be matched is obtained, so that the image change degree of the video and the rhythm intensity of the selected audio can be matched with each other, for example, the video with larger image change degree is matched with the audio with high rhythm intensity; by identifying the labels of the image frames in the video to be matched, the theme of the video to be matched is more matched with the theme of the selected audio; therefore, the scheme enables the selected target matching audio to be more suitable for the video to be matched on the theme and rhythm, the requirement of the user on the audio is more met, the audio matching efficiency is improved, the audio matching degree can be improved, and the user experience is further improved. On the other hand, the selected target matching audio is not limited in hot audio, so that the target matching audio can be selected from the audio library in a targeted manner according to the characteristics of the video to be matched, the audio selection range is wider, and the diversity of audio selection is further improved.

Each step in fig. 1 will be described in detail below.

In step S110, the degree of change between different image frames in the video to be matched is determined, so as to obtain the degree of change of the image of the video to be matched.

The video to be matched refers to the video needing to be matched with the audio. The video to be matched can be analyzed, and the image frames arranged according to the time sequence in the video to be matched are obtained. The degree of change between image frames is a quantitative representation of the difference between image frames. The image variation degree of the video to be matched refers to the overall quantitative representation of the variation degree between different image frames on the global level of the video.

In an alternative embodiment, as shown in fig. 2, the degree of variation between different image frames in the video to be matched may be determined by the following steps S210 to S220:

in step S210, the similarity between different image frames in the video to be matched is determined.

The similarity between different image frames may include the similarity between adjacent image frames, and may also include the similarity between non-adjacent image frames. It should be noted that, in the process of determining the similarity between different image frames, the similarity between all different image frames in the video to be matched may be determined, or the similarity between some different image frames in the video to be matched may be determined.

In step S220, a degree of variation between different image frames is determined according to a similarity between the different image frames.

When the similarity between different image frames is expressed by a percentage, the degree of change between different image frames may be expressed as y 100-x 100, where x denotes the similarity between different image frames and y denotes the degree of change between different image frames.

When the similarity between different image frames is normalized, the variation between different image frames may be represented as y-1-x, where x represents the similarity between different image frames and y represents the variation between different image frames.

Generally, the higher the similarity between image frames, the lower the degree of change; the lower the similarity between image frames, the higher the degree of change.

Besides the similarity, the degree of change between the image frames may be determined in other manners, for example, the pixel values of the two image frames may be directly subtracted to obtain the degree of change between the two image frames.

In the above step shown in fig. 2, the degree of change between different image frames in the video to be processed is determined, mainly to further determine the degree of change of the video to be processed, so as to determine the tempo intensity of the audio to be matched according to the degree of change of the video to be processed.

In step S110, adjacent image frames may be selected to calculate the degree of change, or non-adjacent image frames may be selected to calculate the degree of change.

For example, step S110 may include:

and determining the variation degree between two adjacent image frames in the video to be matched.

It should be noted that, in the process of determining the degree of change between two adjacent image frames in the video to be matched, the degree of change between each two adjacent image frames in the video to be matched may be determined, or the degree of change between some adjacent image frames in the video to be matched may also be determined.

In the process, the change degree between two adjacent image frames is determined, so that the change degree is determined more finely and can represent the change degree of the video image.

For another example, the video to be processed may be divided into a plurality of segments, and in each segment, the degree of change between the first image frame and each subsequent image frame is calculated.

The image change degree of the video to be processed is globally represented by the change degree between different image frames. In order to obtain an accurate image variation degree, the image frame for calculating the variation degree may cover the whole video to be processed, for example, step S110 may include:

and traversing the video to be matched, and determining the degree of change between each image frame and the previous image frame.

The degree of change between the image frames determined by the above process can more completely cover each image frame of the video to be matched.

In an alternative embodiment, a representative part of key image frames may be selected from the video to be processed, and the degree of change between different key image frames may be calculated.

After the change degrees of different image frames are determined, the image change degrees of the video to be processed can be obtained through different calculation methods.

In an alternative embodiment, the variation degrees of different image frames may be averaged, for example, step S110 may include:

and averaging the change degree between each image frame and the previous image frame to obtain the image change degree of the video to be matched.

The variation degree of the video to be matched can be obtained by averaging the variation degrees among different image frames. The process for determining the image change degree of the video to be matched is simple and efficient, and a matching object is provided for the rhythm intensity of the follow-up matching audio.

In an alternative embodiment, on the basis of the above-mentioned averaging of the change degrees of different image frames, the average value is multiplied by a coefficient related to the duration of the video to be processed, so as to obtain the image change degree of the video to be processed. The coefficient is generally inversely related to the duration of the video to be processed, for example, the coefficient is equal to the duration of the video to be processed/the average duration of the video. Therefore, the influence of the difference of the time length on the image change degree of the video can be reduced.

With continued reference to fig. 1, in step S120, the tags of the video to be matched are determined by identifying the tags of the image frames in the video to be matched.

Before identifying the tags of the image frames in the video to be matched, the images can be classified according to different scenes. Such as travel, work, home, bar, sport, military travel, religion, gourmet, game, mall, etc., a general value is set for each category according to the general applicability of each category, such as a general value of 5 for travel, 1 for bar, 3 for gourmet, and one or more tags are set for each category, such as travel, including tags such as forest, mountain, snow mountain, seaside, etc.

The label corresponding to the image frame can be determined by identifying the content of the image frame in the video to be matched. It should be noted that each image frame tag may correspond to a plurality of tags. The labels corresponding to the image frames in the video to be matched can be classified and summarized, and then the labels of the video to be matched are determined.

In an alternative embodiment, the labels of each image frame are counted, and the appearance ratio of each label is calculated, for example, if the video to be processed includes N image frames, where the labels of the M image frames all include the label t1, the appearance ratio of t1 is M/N. Then, the label with the occurrence ratio higher than the preset ratio threshold is selected as the label of the video to be processed, and the ratio threshold may be set according to experience or actual conditions (for example, considering the number of frames of the video to be processed), which is not limited in the present disclosure.

With continued reference to fig. 1, in step S130, the image variation of the video to be matched is matched with the rhythm intensity of each audio in the audio library, the tag of the video to be matched is matched with the tag of each audio, and one or more target matching audio matched with the video to be matched is determined.

The audio library may be a preset audio library. It should be noted that step S130 includes two matching methods: matching the image change degree of the video to be matched with the rhythm intensity of the audio; and matching the label of the video to be matched with the label of the audio. The execution sequence of the two matching modes can be adjusted according to actual needs, and the method comprises the following specific schemes:

according to the first scheme, the image change degree of the video to be matched is matched with the rhythm intensity of each audio in an audio library to obtain the pre-matched audio, then the label of the video to be matched is matched with the label of the pre-matched audio, and then the target matched audio matched with the video to be matched is determined.

And matching the labels of the video to be matched with the labels of each audio in the audio library to obtain pre-matched audio, matching the image variation degree of the video to be matched with the rhythm intensity of the pre-matched audio, and further determining the target matched audio matched with the video to be matched.

And a third scheme is that the two processes are executed simultaneously, the image variation degree of the video to be matched is matched with the rhythm intensity of the audio to obtain one or more first audios, the labels of the video to be matched are matched with the labels of the audio to obtain one or more second audios, the intersection or union of the first audio and the second audio is taken, and the target matching audio matched with the video to be matched is determined.

Taking the above scheme one as an example, the following specific description is made:

in an alternative embodiment, step S130 may further determine one or more target matching audios matching the video to be matched through the following steps S310 to S320, as shown in fig. 3:

in step S310, the image variation of the video to be matched is matched with the rhythm intensity of each audio in the audio library, and a pre-matching audio is determined from the audio library.

In an alternative embodiment, a method for matching an image variation degree of a video to be matched with a rhythm intensity of each audio in an audio library and determining a pre-matching audio from the audio library, as shown in fig. 4, includes the following steps S410 to S430:

in step S410, the tempo intensity of each audio in the audio library is acquired.

Before obtaining the rhythm intensity of each audio in the audio library, setting a corresponding rhythm intensity value for each audio according to the rhythm intensity of each audio in the audio library. May be set to a value within 0 to 100. For convenience of subsequent matching operation, the rhythm intensity may be normalized and set to a value between 0 and 1. It should be noted that the cadence strength may be at the same numerical level as the image variation of the video, for example, the numerical range of the image variation is 0 to 1, and the numerical range of the cadence strength is also 0 to 1.

In step S420, a first matching degree between each audio and the video to be matched is determined according to a difference between the rhythm intensity of each audio and the image variation degree of the video to be matched.

When the difference value of the rhythm intensity of the audio and the image change degree of the video to be matched is calculated, the absolute value of the difference value can be obtained, so that the difference value is guaranteed to be a positive value, and the subsequent processing is facilitated. The first matching degree is the matching degree of the rhythm intensity of the audio and the image variation degree of the video to be matched. The larger the difference between the rhythm intensity of each audio and the image change degree of the video to be matched is, the smaller the first matching degree is; the smaller the difference between the rhythm intensity of each audio and the image change degree of the video to be matched is, the larger the first matching degree is. When the numerical range of the image change degree and the numerical range of the rhythm intensity are 0-1, assuming that the absolute value of the difference value between the rhythm intensity of the audio 1 and the image change degree of the video to be matched is X, 1-X can be used for representing the first matching degree of the audio 1 and the video to be matched.

In step S430, pre-matching audios are determined according to the first matching degrees of the audios and the video to be matched.

The audio with higher first matching degree with the video to be matched can be selected from the audio library as the pre-matching audio.

In the step shown in fig. 4, the audios in the audio library may be sorted according to the sequence from small to large of the difference between the rhythm intensity of each audio and the image change degree of the video to be matched, and the pre-matched audio may be selected from the audio library according to the sequence from front to back. The audios in the audio library can also be sorted according to the sequence from large to small of the difference value between the rhythm intensity of each audio and the image change degree of the video to be matched, and the pre-matched audio is selected from the audio library according to the sequence from back to front. The method determines the first matching degree of the audio through the difference value of the rhythm intensity of the audio and the image change degree of the video to be matched, and is simple in process and easy to implement.

In an alternative embodiment, step S430 may select one or more audios with the highest first matching degree as the pre-matching audios.

For example: and selecting the audio with the first matching degree higher than a preset first matching degree threshold value as the pre-matching audio.

A first predetermined number of pre-matched tones with the highest first degree of match may also be selected. The first predetermined number is a fixed number of pre-matched audio that is preset and may be any positive integer.

The audio with the highest matching degree is selected, and the audio with the smallest difference between the rhythm intensity of each audio and the image change degree of the video to be matched can be selected. The process defines the rule of selecting the pre-matched audio, namely, the audio with the highest matching degree is selected as the pre-matched audio, and a selection basis is provided for the selection of the pre-matched audio.

In an alternative embodiment, when one or more audio frequencies with the highest first matching degree are selected, a first predetermined number of audio frequencies with the highest first matching degree may be selected.

The first predetermined number is a preset number of pre-matched audio. If the number of the audios contained in the audio library is less than the first predetermined number, all the audios in the audio library may be determined as the pre-matching audios.

In step S320, the tags of the video to be matched are matched with the tags of the pre-matched audios, and the target matched audio is determined from the pre-matched audio.

Each audio in the audio library may be classified and set with a tag in advance, for example, the name, title, lyric, and the like of the audio may be classified and identified according to text semantics, so as to obtain a category and a tag corresponding to the audio. For example, audio 1, category is travel, label forest, mountain. It should be noted that multiple categories or multiple tags may be determined for each audio.

In the step shown in fig. 3, the target matching audio is determined from the pre-matching audio, so that the audio further matches the tag of the video to be matched on the basis of being capable of matching the image variation degree of the video to be matched. Through the double matching of the image change degree of the video to be matched and the label, the matching effect of the selected target matching audio and the video to be matched is better, and the requirements of users are better met in rhythm and theme.

In an alternative embodiment, as shown in fig. 5, step S320 includes the following steps S510 to S530:

in step S510, for any pre-matching audio, the tags of the pre-matching audio are compared with the tags of the video to be matched, and the same tags form a tag matching list of the pre-matching audio.

A tag matching list is generated for each pre-matching audio, and one or more tags included in the tag matching list may be set to a null value.

For example, the tags of the pre-matched audio 1 include t1, t3, t4 and t11, the tags of the video to be matched include t2, t3, t4, t9, t11, t14 and t16, and the two tags intersect to obtain a tag matching list of the pre-matched audio 1, which includes t3, t4 and t 11.

In step S520, a second matching degree between the pre-matching audio and the video to be matched is determined according to the tag matching list.

The second matching degree is the matching degree of the label of the audio in the pre-matching audio and the label of the video to be matched. The higher the label match, the higher the second match. The lower the degree of matching of the label, the lower the degree of matching of the second.

In an alternative embodiment, step S520 may determine a second matching degree between any pre-matching audio and the video to be matched according to a general value of a category to which each tag in the tag matching list belongs.

The general value of the category to which each tag in the tag matching list of the pre-matching audio belongs may be summed as the second matching degree of the pre-matching audio and the video to be matched. For example: the tag matching list of the pre-matching audio 1 comprises t3, t4 and t11, and if the categories of t3 and t11 are both c1, the general value of c1 is s1, the category of t4 is c3, and the general value of c3 is s3, the second matching degree of the pre-matching audio 1 and the video to be matched is 2s1+ s 3. The second matching degree is associated with the matching label of the audio frequency in the process, and a reference basis is provided for selecting the target matching audio frequency according to the second matching degree subsequently. The corresponding audio second matching degree is determined through the tag matching list, the process is simple, and the implementation is easy.

In step S530, a target matching audio is determined according to the second matching degree between each pre-matching audio and the video to be matched.

And selecting the audio with higher second matching degree than the video to be matched from the pre-matched audio as the target matching audio. For example: and sequencing the pre-matched audio according to the sequence from large to small of the second matching degree of the pre-matched audio and the video to be matched, and selecting the target matched audio from the pre-matched audio according to the sequence from front to back.

In an alternative embodiment, step S530 may include: and selecting one or more pre-matched audios with the highest second matching degree as target matched audios.

For example: the one of the pre-matched tones with the highest second degree of match may be selected.

Or selecting the pre-matched audio with the second matching degree higher than a preset second matching degree threshold value.

A second predetermined number of pre-matched tones with the highest second degree of match may also be selected. The second predetermined number is a fixed number of preset target matching audios and may be any positive integer. The second predetermined number may also be 1/k of the first predetermined number, e.g., k may be 2.

When the number of the pre-matching audios exceeds a second predetermined number, the pre-matching audios conforming to the second predetermined number may be acquired by:

(1) and sequencing the second matching degrees from large to small to obtain a list E, recording the number of the currently acquired audios by using a numerical value F, and setting the initial value of the numerical value F to be 0.

(2) And selecting a second matching degree from the list E in sequence, acquiring the audio corresponding to the matching degree, adding the number of the audio corresponding to the current second matching degree to the numerical value F, and judging the second preset number of the numerical value F.

(3) If the numerical value F is smaller than the second preset number, determining the audio corresponding to the current second matching degree as a target matching audio, checking whether the list E has the next second matching degree, and if so, circulating the step (2); and if the numerical value F is larger than the second preset number, randomly selecting a part of the audio corresponding to the current second matching degree as the target matching audio, and ensuring that the number of the finally selected target matching audio is equal to the second preset number.

Since there may be a plurality of audios having the same second matching degree as the video to be matched when selecting the second predetermined number of pre-matched audios having the highest second matching degree, the method may ensure that the number of finally selected target matched audios is exactly the second predetermined number.

In an alternative embodiment, when the number of pre-matching audios is less than the second predetermined number, as shown in fig. 6, the following steps S610 to S620 are included:

in step S610, the pre-matching audios are all determined as the target matching audios.

In step S620, a target matching audio that is a complement is determined from hot audio other than the pre-matching audio, and the number of target matching audio after the complement is made a second predetermined number.

The hot audio may be set as audio with a higher usage rate or click rate. Under the condition that the number of the pre-matched audios is insufficient, a method for selecting a fixed number of audios is provided, the hot audio is used as the alternative audio of the target matched audio, not only can a sufficient number of audios be provided for a user, but also richer choices can be provided for the user, and therefore the user experience can be improved to a certain extent by adopting the hot audio to supplement the target matched audio.

In an optional embodiment, the method further comprises: and displaying the information of the target matching audio, so that the user selects a final matching audio.

The determined target matching audio can be presented to the user so that the user can select the required audio from the target matching audio and synthesize the finally selected audio and the video to be matched.

Fig. 7 provides a system structure diagram of matching of video and audio, which includes a video parsing module 710, a matching rule management module 720, an image recognition module 730, an image similarity calculation module 740, a matching audio processing module 750, and a video generation module 760. The lines with arrows in fig. 7 indicate the calling relationship between the modules, and the matching audio processing module 750 can call the other five modules, which are described below as six modules in fig. 7:

the video parsing module 710 is mainly used for parsing the content of the video to be matched and extracting all image frame information of the video to be matched.

The matching rule module 720 is mainly used for management and maintenance, image classification information, and image label information.

The image recognition module 730 mainly recognizes image information in the image frame to obtain tag information of the image frame.

The image similarity calculation module 740 compares the contents of an ordered image list to calculate the similarity between images.

And a matching audio processing module 750 which obtains the category of the video and the frequency of switching the image frame contents based on the result of the image recognition module and the result of the image similarity, and matches the related audio contents according to the two contents.

And the video generation module 760 generates a new video based on the existing video and the matched audio.

Exemplary embodiments of the present disclosure also provide a video and audio matching apparatus. As shown in fig. 8, the video and audio matching apparatus 800 may include:

the image change degree determining module 810 is configured to determine a change degree between different image frames in the video to be matched, so as to obtain an image change degree of the video to be matched;

a video tag determination module 820, configured to determine a tag of a video to be matched by identifying a tag of an image frame in the video to be matched;

the target audio determining module 830 is configured to match the image variation of the video to be matched with the rhythm intensity of each audio in the audio library, match the tag of the video to be matched with the tag of each audio, and determine one or more target matching audios that match the video to be matched.

In an alternative embodiment, determining the degree of change between different image frames in the video to be matched may include: determining the similarity between different image frames in a video to be matched; and determining the change degree between different image frames according to the similarity between different image frames.

In an optional implementation, determining a degree of variation between different image frames in the video to be matched may further include: and the adjacent image frame change degree determining module is used for determining the change degree between two adjacent image frames in the video to be matched.

In an alternative embodiment, the adjacent image frame variation degree determination module is configured to: and traversing the video to be matched, and determining the degree of change between each image frame and the previous image frame.

In an alternative embodiment, the image change degree determining module 810 is configured to: and averaging the change degree between each image frame and the previous image frame to obtain the image change degree of the video to be matched.

In an alternative embodiment, the target audio determining module 830 includes: the pre-matching audio determining module is used for matching the image change degree of the video to be matched with the rhythm intensity of each audio in the audio library and determining pre-matching audio from the audio library; and the first target audio determining submodule is used for matching the tags of the video to be matched with the tags of the pre-matched audios and determining the target matched audio from the pre-matched audios.

In an alternative embodiment, the pre-match audio determination module includes: the audio rhythm acquisition module is used for acquiring the rhythm intensity of each audio in the audio library; the first matching degree determining module is used for determining the first matching degree of each audio and the video to be matched according to the difference between the rhythm intensity of each audio and the image change degree of the video to be matched; and the first pre-matching audio determining submodule is used for determining the pre-matching audio according to the first matching degree of each audio and the video to be matched.

In an alternative embodiment, the first pre-match audio determination sub-module is configured to: and selecting one or more audios with the highest first matching degree as pre-matching audios.

In an optional implementation manner, selecting one or more audios with the highest first matching degree includes: a first predetermined number of tones with the highest first degree of match is selected.

In an alternative embodiment, the first target audio determination sub-module comprises: the tag matching list generating module is used for comparing a tag of any pre-matching audio with a tag of a video to be matched for any pre-matching audio, and forming a tag matching list of any pre-matching audio by using the same tag; the second matching degree determining module is used for determining the second matching degree of any pre-matched audio and the video to be matched according to the tag matching list; and the second target audio determining submodule is used for determining the target matching audio according to the second matching degree of each pre-matching audio and the video to be matched.

In an optional implementation, the second matching degree determining module is configured to: and determining a second matching degree of any pre-matched audio and the video to be matched according to the general value of the category to which each label belongs in the label matching list.

In an alternative embodiment, the second target audio determination submodule is configured to: and selecting one or more pre-matched audios with the highest second matching degree as target matched audios.

In an optional implementation manner, the selecting one or more pre-matching audios with the highest second matching degree includes a second pre-matching audio module configured to: and selecting a second preset number of pre-matched audios with the highest second matching degree.

In an alternative embodiment, when the number of pre-matched audios is less than a second predetermined number, the second pre-matched audio module is further configured to: determining all the pre-matched audios as target matched audios; and determining target matching audio serving as supplement from hot audio except the pre-matching audio, and enabling the number of the target matching audio after the supplement to be a second preset number.

In an optional embodiment, the method further comprises: a video generation module configured to: and displaying the information of the target matching audio, so that the user selects a final matching audio.

The specific details of each part in the video and audio matching apparatus 800 are described in detail in the method part embodiment, and details that are not disclosed may refer to the method part embodiment, and thus are not described again.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing an electronic device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the electronic device. The program product may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The exemplary embodiment of the present disclosure also provides an electronic device capable of implementing the above method. An electronic device 900 according to this exemplary embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 9, electronic device 900 may take the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: at least one processing unit 910, at least one memory unit 920, a bus 930 that connects the various system components (including the memory unit 920 and the processing unit 910), and a display unit 940.

The storage unit 920 stores program code, which may be executed by the processing unit 910, so that the processing unit 910 performs the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary method" section of this specification. For example, processing unit 910 may perform any one or more of the method steps of fig. 1-6.

The storage unit 920 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)921 and/or a cache memory unit 922, and may further include a read only memory unit (ROM) 923.

Storage unit 920 may also include a program/utility 924 having a set (at least one) of program modules 925, such program modules 925 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims

1. A method for matching video and audio, comprising:

determining the variation degree between different image frames in a video to be matched to obtain the image variation degree of the video to be matched;

determining the label of the video to be matched by identifying the label of the image frame in the video to be matched;

matching the image variation degree of the video to be matched with the rhythm intensity of each audio in an audio library, matching the label of the video to be matched with the label of each audio, and determining one or more target matching audio matched with the video to be matched.

2. The method of claim 1, wherein the determining a degree of variation between different image frames in the video to be matched comprises:

determining the similarity between different image frames in the video to be matched;

and determining the degree of change between the different image frames according to the similarity between the different image frames.

3. The method of claim 1, wherein the determining a degree of variation between different image frames in the video to be matched comprises:

4. The method according to claim 3, wherein the determining the degree of change between two adjacent image frames in the video to be matched comprises:

and traversing the video to be matched, and determining the degree of change between each image frame and the last image frame.

5. The method according to claim 4, wherein the determining a degree of change between different image frames in the video to be matched to obtain an image degree of change of the video to be matched further comprises:

6. The method of claim 1, wherein the matching the image variation degree of the video to be matched with the rhythm intensity of each audio in an audio library, and the matching the label of the video to be matched with the label of each audio, and determining one or more target matching audios matching the video to be matched comprises:

matching the image variation degree of the video to be matched with the rhythm intensity of each audio in an audio library, and determining pre-matched audio from the audio library;

matching the labels of the videos to be matched with the labels of the pre-matched audios, and determining the target matched audio from the pre-matched audios.

7. The method according to claim 6, wherein the matching the image variation degree of the video to be matched with the rhythm intensity of each audio in an audio library, and the determining the pre-matching audio from the audio library comprises:

acquiring the rhythm intensity of each audio in the audio library;

determining a first matching degree of each audio and the video to be matched according to the difference between the rhythm intensity of each audio and the image variation degree of the video to be matched;

and determining the pre-matching audio according to the first matching degree of each audio and the video to be matched.

8. The method according to claim 7, wherein the determining the pre-matching audio according to the first matching degree of the respective audio and the video to be matched comprises:

and selecting one or more audio frequencies with the highest first matching degree as the pre-matching audio frequencies.

9. The method according to claim 8, wherein said selecting one or more of the audio frequencies with the highest first matching degree comprises:

and selecting a first preset number of audios with the highest first matching degree.

10. The method according to claim 6, wherein the matching the tag of the video to be matched with the tag corresponding to each audio in each pre-matched audio, and the determining the target matched audio from the pre-matched audio comprises:

for any pre-matched audio, comparing the label of the any pre-matched audio with the label of the video to be matched, and forming a label matching list of the any pre-matched audio by using the same label;

determining a second matching degree of any pre-matched audio and the video to be matched according to the tag matching list;

and determining the target matching audio according to the second matching degree of each pre-matching audio and the video to be matched.

11. The method according to claim 10, wherein the determining a second matching degree of the any pre-matching audio and the video to be matched according to the tag matching list comprises:

and determining a second matching degree of any pre-matched audio and the video to be matched according to the general value of the category to which each label belongs in the label matching list.

12. The method according to claim 10, wherein the determining the target matching audio according to the second matching degree of each pre-matching audio and the video to be matched comprises:

and selecting one or more pre-matched audios with the highest second matching degree as the target matched audio.

13. The method according to claim 12, wherein said selecting one or more of the pre-matched audios with the highest second matching degree comprises:

and selecting a second preset number of the pre-matched audios with the highest second matching degree.

14. The method of claim 13, wherein when the number of pre-match tones is less than the second predetermined number, the method further comprises:

determining all the pre-matched audios as the target matched audio;

and determining the target matching audio as supplement from hot audio except the pre-matching audio, and enabling the number of the target matching audio after supplement to be the second preset number.

15. The method of claim 1, further comprising:

and displaying the information of the target matching audio, so that the user selects a final matching audio.

16. A video and audio matching apparatus, comprising:

the image change degree determining module is used for determining the change degree between different image frames in the video to be matched to obtain the image change degree of the video to be matched;

the video tag determining module is used for determining tags of the videos to be matched by identifying tags of the image frames in the videos to be matched;

and the target audio determining module is used for matching the image change degree of the video to be matched with the rhythm intensity of each audio in an audio library, matching the label of the video to be matched with the label of each audio and determining one or more target matching audios matched with the video to be matched.

17. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 15.

18. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1 to 15 via execution of the executable instructions.