CN111198958A

CN111198958A - Method, device and terminal for matching background music

Info

Publication number: CN111198958A
Application number: CN201811375951.1A
Authority: CN
Inventors: 吴洁
Original assignee: TCL Research America Inc
Current assignee: TCL Corp; TCL Research America Inc
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2020-05-26

Abstract

The invention is suitable for the technical field of multimedia, and provides a method, a device and a terminal for matching background music, wherein the method comprises the following steps: acquiring voice information and image information of a target video to be processed; determining a target category label to which the target video belongs based on the voice information and the image information; and determining the target music matched with the target category label from the candidate music library. According to the embodiment of the invention, as the feature tags are extracted from the voice part and the image part, the generated video category tags have more comprehensive content, the content of the video can be reflected better, the matching degree of the matched background music and the video is high, a user does not need to select for many times, and the efficiency of setting the background music is improved.

Description

Method, device and terminal for matching background music

Technical Field

The invention belongs to the technical field of multimedia, and particularly relates to a method, a device and a terminal for matching background music.

Background

With the development of deep learning, more and more intelligent devices are applied to daily life, intelligence is more and more deep into people's life, and people also hope to make the life simpler and more efficient with the help of intelligent science and technology. There are many people today who take photos of a portrait, wedding photos, and the like. When a photograph is taken as a video, it is often necessary to add background music. Since the music is chosen to be different for different videos, it always takes a lot of time to select the appropriate music when adding background music. However, people only think of the result after the music and the content are synthesized, and do not want to waste time for selecting music.

In the prior art, a method for matching background music for videos is provided, but the method is only simple literal matching, the background music given after matching cannot meet the expectation of users, the matching degree is low, the users still need to select for many times, the time is wasted, and the efficiency is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, and a terminal for matching background music, so as to solve the problems in the prior art that the matching degree of the matched background music is low, a user needs to select the background music for multiple times, time is wasted, and efficiency is low.

In a first aspect, an embodiment of the present invention provides a method for matching background music, where the method includes:

acquiring voice information and image information of a target video to be processed;

determining a target category label to which the target video belongs based on the voice information and the image information;

and determining the target music matched with the target category label from the candidate music library.

In a second aspect, an embodiment of the present invention provides an apparatus for matching background music, including:

the first acquisition unit is used for acquiring voice information and image information of a target video to be processed;

a second obtaining unit, configured to determine, based on the voice feature information and the image feature information, a target category tag to which the target video belongs;

and the determining unit is used for determining the target music matched with the target category label from the candidate music library.

In a third aspect, an embodiment of the present invention provides another terminal for matching background music, including: memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method of matching background music of the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing a computer program, the computer program comprising program instructions that, when executed by a processor, cause the processor to perform the steps of the method for matching background music of the first aspect.

The embodiment of the invention obtains the voice information and the image information of the target video to be processed; determining a target category label to which the target video belongs based on the voice information and the image information; and determining the target music matched with the target category label from the candidate music library. According to the embodiment of the invention, as the feature tags are extracted from the voice part and the image part, the generated video category tags have more comprehensive content, the content of the video can be reflected better, the matching degree of the matched background music and the video is high, a user does not need to select for many times, and the efficiency of setting the background music is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of an implementation of a method for matching background music according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a method implementation of matching background music according to another embodiment of the invention;

fig. 3 is a flowchart of a detailed process of S202 in a method for matching background music according to an embodiment of the present invention;

fig. 4 is a flowchart of a detailed process of S203 in a method for matching background music according to an embodiment of the present invention;

fig. 5 is a flowchart of a detailed process of S2031 in the method for matching background music according to the embodiment of the present invention;

FIG. 6 is a block diagram of an apparatus for matching background music according to an embodiment of the present invention;

fig. 7 is a schematic block diagram of a terminal for matching background music according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples. Referring to fig. 1, fig. 1 is a schematic flow chart of a method for matching background music according to an embodiment of the present invention. The execution subject of the method for matching background music in this embodiment is a terminal, and the terminal includes but is not limited to a mobile terminal such as a smart phone, a tablet computer, and a personal digital assistant PAD. The method of matching background music as shown in the figure may include:

s101: and acquiring voice information and image information of the target video to be processed.

When detecting an instruction for setting background music for a video, the terminal acquires a target video to be processed and extracts voice information and image information of the target video to be processed. The target video to be processed is a video for which background music needs to be set.

The voice information may include text information and intonation information. The textual information identifies conversational content contained in the target video. The intonation information identifies the emotion of the speaker in the target video. The text information is obtained by converting the voice information into a text and performing text analysis on the text.

The image information is obtained after the image of the target video to be processed is subjected to framing processing. The image information may include any one of or any combination of at least two of environment information, brightness information, color information, character limb information, and face information.

S102: and determining a target category label to which the target video belongs based on the voice information and the image information.

After the terminal acquires the voice information and the image information, the voice information and the image information are sorted and analyzed, and the target type label to which the target video belongs is determined based on the sorting and analyzing result. The sorting and analyzing process can be to extract keywords of the voice information, extract keywords of the image information, and determine a category label of the target video based on the keywords of the voice information and the keywords of the image information. The category label of the target video may have multiple attributes, for example, the category label of the target video [ 'happy', 'sandy' or 'play' ]. The method for sorting and analyzing and the method for obtaining the target category label to which the target video belongs are not limited to these, and other methods may also be adopted, which are not limited here.

S103: and determining the target music matched with the target category label from the candidate music library.

In one embodiment, a music library is preset in the terminal, various types of music are stored in the music library, and each music is provided with a corresponding tag. The number of the tags set for each piece of music may be one or at least two.

And determining target music matched with the target category label from the candidate music library, presetting a preset relation between the target category label and the music label by the terminal, and determining the target music according to the preset relation between the target category label and the music label.

In another embodiment, the target music matched with the target category label in the candidate music library may be determined by matching the target category label of the target video with the music label and calculating the similarity between the two groups of labels, so as to obtain a matching result, and the music corresponding to the music label with the higher matching degree with the target category label is more matched with the target video to be processed.

For the method for calculating the similarity between the target category label and the music label, the similarity between two groups of labels can be calculated by adopting Tanimoto score, the target category label and the music label of the video to be processed are calculated pairwise in one group, and the result of the similarity is output by utilizing descending order, namely the matching result. The Tanimoto score can calculate the degree of similarity between users, and the algorithm principle is as follows:

if a is set as a ═ 1,0,1,1,0, and B is set as B ═ 1,1,1,1,0, and C ═ a & B ═ 1,1,1,0, then a and B have the following similarities:

t ═ len (c)/(len (a)) + len (B)) -len (c)) -4/(5 +5-4) ═ 0.66667, i.e., the degree of matching between a and B was 0.66667.

According to the scheme, the voice information and the image information of the target video to be processed are obtained; determining a target category label to which the target video belongs based on the voice information and the image information; and determining the target music matched with the target category label from the candidate music library. According to the embodiment of the invention, as the feature tags are extracted from the voice part and the image part, the generated video category tags have more comprehensive content, the content of the video can be reflected better, the matching degree of the matched background music and the video is high, a user does not need to select for many times, and the efficiency of setting the background music is improved.

Referring to fig. 2, fig. 2 is a schematic flow chart of a method for matching background music according to another embodiment of the present invention. In this embodiment, an execution body of the method for sorting the application icons is a terminal, and the terminal includes but is not limited to a mobile terminal such as a smart phone, a tablet computer, and a personal digital assistant PAD. The method of matching background music as shown in the figure may include:

s201: and acquiring voice information and image information of the target video to be processed.

In this embodiment, S201 is the same as S101 in the previous embodiment, and please refer to the related description of S101 in the previous embodiment, which is not repeated herein.

S202: and determining the voice category label of the voice information according to the voice information.

After the terminal acquires the voice information, the voice information is sorted and analyzed, and the voice category label of the voice information is determined based on the result of sorting and analysis. The process of sorting and analyzing can be converting voice into text content, extracting keywords from the text content, determining the keywords of the voice information, and determining the voice category label based on the keywords of the voice information. The voice category label may have multiple attributes, for example, the voice category label [ 'text content label', 'sound emotion label' ]. The method of sorting and analyzing and the method of obtaining the voice category tag are not limited to this, and other methods may also be adopted, which are not limited here.

Alternatively, S202 may include S2021-S2022, as shown in FIG. 3. The method comprises the following specific steps:

s2021: and analyzing the voice information, and determining a text content label and a sound emotion label of the voice information.

After the terminal acquires the voice information, the voice information is sorted and analyzed, and a text content label and a voice emotion label of the voice information are determined based on the result of sorting and analysis.

The method comprises the steps that the terminal converts voice content into a text format through a voice recognition function to obtain text information, and analyzes the text information, wherein the process is mainly used for analyzing the content and emotion of the text information. And performing word segmentation and preprocessing on the text information to obtain word segmentation results of the text, then counting words with the highest frequency, and adding emotional words to the words to obtain the text content label. Specifically, the content of the current conversation may be analyzed first to determine which aspect the current conversation belongs to, such as daily conversation, building jump, or negotiation, and so on, and obtain the context in which the conversation is located; and then analyzing the emotional words used by the speaking person, such as good weather and beautiful mood today, so as to judge that the current emotion of the person is happy. The weight of the emotional words is a bit higher than the weights of the other words, for example, the text is mainly in the learning and speaking, after analysis, the emotional word of "happy" is obtained many times, and then the text content label is: text content label [ 'happy learning' ].

The acquisition process of the sound emotion label comprises the following steps: after the terminal acquires the voice information, the voice information is sorted and analyzed, whether the current speaking emotion of the speaker is calm, angry, panic or surprise or other emotions is judged, a sorting and analyzing result is obtained, and a voice emotion label is determined based on the sorting and analyzing result. The analysis of the voice information can be analyzed from the aspects of voice pitch, pause of speaking, duration of various pitches, difference of pitch in unit time and the like, for example, the analysis of the pitch indicates calmness if the pitch is flat; if the stress is more, the anger is represented; for example, rising tone indicates surprise; if the tail of a sentence is down, it indicates that the exclamation requests the same class.

A set of voice tags is preset in the terminal, for example, the voice tags [ 'calm', 'anger', 'surprise', 'exclamation request' ], and different tones correspond to different voice tags. For example, it is preset that the voice tag corresponding to the rising tone is surprised, if the sound emotion tag of the voice 1 is to be determined, the voice 1 is analyzed, all tags in the voice 1 are obtained, and the tag with the largest occurrence frequency is obtained by statistics, and if the rising tone frequency in the voice 1 is the largest, the sound emotion tag of the voice 1 is surprised and is recorded as the voice 1 [ 'surprised excitement' ].

S2022: and integrating the text content label and the sound emotion label to obtain the voice category label.

And the terminal converts the text content label and the sound emotion label into character strings, and adds the text content label and the sound emotion label character strings into the voice category label array to obtain the voice category label. The voice category tag may be noted as: the speech category label [ 'text content label', 'sound emotion label' ].

In this embodiment, the voice information is analyzed to determine a text content tag and a voice emotion tag of the voice information, and the text content tag and the voice emotion tag are integrated to obtain the voice category tag. The method for acquiring the voice category labels is refined, the text content labels and the voice emotion labels are obtained by analyzing the text information and the audio information in the voice information, the labels are more representative and more comprehensive, and the accuracy in acquiring the matched music is further improved.

S203: and determining an image category label of the image information according to the image information.

After the terminal acquires the image information, the image information is sorted and analyzed, and the image category label of the image information is determined based on the sorting and analyzing result. The process of collation analysis may be performed by identifying a person, representative building, environment, etc. in the image, determining a keyword of the image information, and determining an image category label based on the keyword of the image information. The image category label may have a plurality of attributes, for example, an image category label [ 'image environment label', 'image tone label', 'image capture subject label' ]. The method of sorting and analyzing and the method of obtaining the image category label are not limited to this, and other methods may also be adopted, which are not limited here.

Further, the image information includes: as shown in fig. 4, the photographing environment information, the brightness information, the color information, and the person characteristic information of the photographed object, S203 may include S2031 to S2033:

s2031: and determining an image environment label of the image information according to the shooting environment information.

After the terminal acquires the image information, acquiring shooting environment information contained in the image information, wherein the shooting environment information is a specific scene in the image and can include streets, landmark buildings, animals, plants, natural geographic environments and the like. And identifying and analyzing the shooting environment information, analyzing whether the current environment is indoor or outdoor, if the current environment is outdoor, analyzing whether the scene is in a street, a park or a desert field, and determining the image environment label of the picture information based on the identification and analysis result. For example, if the current scene is analyzed to be at the beach seaside, then the image environment label [ 'beach at seaside' ].

S2032: and determining an image tone label of the image information according to the brightness information and the color information.

The terminal acquires brightness information and color information contained in the image information after acquiring the image information, analyzes the brightness information and the color information, analyzes the tone of the image, analyzes whether the current image is bright, warm, dark or dark, obtains an analysis result, and determines an image tone label of the image information based on the analysis result. For example, the image may not have a strong color contrast, and may have a hue of orange, or yellow, which is generally a warm and warm style, and if analyzed, may give a warm image hue label.

Further, for S2032, it may include: acquiring tristimulus values of all pixel points contained in each frame of image in the target video; extracting the tristimulus values with the highest occurrence frequency from the tristimulus values of all the pixel points; and determining an image tone label according to the tristimulus values with the highest occurrence frequency.

In the image, the image is divided into three tones, namely a cool tone, a warm tone and a middle tone according to color properties, and the three tones are generally expressed emotional tones; therefore, a color tone table is established according to the three primary color values, and the three primary color values are divided into cool tone, warm tone and intermediate tone according to the three primary color threshold values. Analyzing the tristimulus values of the pixel points in each frame of image in the target video, extracting the tristimulus values with the highest occurrence frequency, comparing the tristimulus values with the tristimulus values corresponding to each tone in the tone table respectively to determine which tone the tristimulus values with the highest occurrence frequency belong to, and determining the image tone label of the current picture based on the comparison result. Through carrying out extraction of three primary color values to image pixel point, the tone of obtaining whole image that can be accurate makes image tone label more accurate like this, laminates the target video.

S2033: determining an image shooting subject label of the image information according to the person feature information; wherein the image capturing subject label is used for identifying the behavior and emotion of a person.

After the terminal acquires the image information, the character characteristic information contained in the image information is acquired, the character characteristic information is analyzed, the behavior and the action of the character in the image are determined, the emotion of the character is determined based on the behavior and the action of the character, and therefore the shooting subject label of the whole image is determined. If the emotion of the person is happy, the shooting subject label of the whole image is happy.

If a plurality of people exist in the picture, extracting the limb information and the expression information of each person, and performing weighted summation to obtain the integral limb information and expression information of the picture. Weighted summation is to set different weighting coefficients to different variables and then calculate the final output. The sum of all weighting factors is 1. The output is the image-capturing subject information, and the variables are the limb information and the expression information. For example, if a plurality of persons have facial expressions in the entire image, it is assumed that the facial information is important and a large coefficient is given to the facial information. For example, five or six persons in an image smile around two persons, and the two persons are standing and kneeling and smile with great care, and in the image, the number of expressions of the persons is larger than the movement of limbs of the persons, it can be estimated that in the image, the face information is important, when the subject information of the image shooting is obtained, the face information accounts for a larger proportion than the limb information, an algorithm can be used to obtain the subject information of the image shooting, the subject information of the image shooting is 0.3 × limb information +0.7 × expression information, wherein 0.3 and 0.7 are weight coefficients, which are determined according to the image information, the face information obtained through the image is "happy", the limb information obtained through the image is "standing" and "kneeling", and the scene is given a wedding or whiting label.

Preferably, for S2033, the person feature information includes body extremity information and facial feature information, and as shown in fig. 5, may include S20331 to S20333:

s20331: and determining the image character limb label according to the human limb information in the character characteristic information.

After the terminal acquires the character characteristic information, the human body limb information contained in the character characteristic information is acquired, and the character limb label is determined based on the human body limb information. The body limb information is the movement of the figure in the image, such as lying, sitting, lying prone, standing and the like.

S20332: and determining image character expression labels according to the facial feature information in the character feature information.

After the terminal acquires the character feature information, the terminal acquires the facial feature information contained in the character feature information and determines the character expression label based on the facial feature information. The facial information is facial expressions of the human body, muscle actions around the mouth and the eyes, and the emotion of the human body is judged to be happy, hurt, angry, feeling, fear and the like through analysis.

S20333: and integrating the image character limb label and the image character expression label to obtain the image shooting subject label.

The terminal converts the image character limb label and the image character expression label into character strings, adds the image character limb label character strings and the image character expression label character strings into an image shooting subject label array, obtains an image shooting subject label, and records the image shooting subject label as: the image capturing subject label [ 'person limb label', 'image person expression label' ].

According to the image shooting subject label, the image character limb label is determined according to the human body limb information in the character characteristic information, the image character expression label is determined according to the facial characteristic information in the character characteristic information, and the image character limb label and the image character expression label are integrated to obtain the image shooting subject label. The image shooting subject label is analyzed and determined from the two angles of the figure limbs and the figure expressions, the label is representative, the fit degree with the target video is high, and the accuracy of finally obtaining the matched music is further improved.

S2034: and integrating the image environment label, the image tone label and the image shooting subject label to obtain the image category label.

The terminal converts the image environment label, the image tone label and the image shooting subject label into character strings, adds the image environment label character string, the image tone label character string and the image shooting subject label character string into an image category label array, and obtains an image category label. The image category label may be written as: the image category label [ 'image environment label', 'image tone label', 'image capture subject label' ].

According to the embodiment of the invention, the acquired image type label is refined, the information is acquired from the aspects of image environment, image tone and image shooting subject and is converted into the label, the attaching degree of the label and the target video is improved, the style and content of the video can be more comprehensively and accurately embodied by the label, and the matched music also has higher matching degree with the target video.

S204: and integrating the voice category label and the image category label to obtain a target category label to which the target video belongs.

The terminal converts the voice category label and the image category label into character strings, adds the voice category label character strings and the image category label character strings into a voice category label array, and obtains a target category label to which the target video belongs, namely the target category label to which the target video belongs can be recorded as: the target category label to which the target video belongs [ 'voice category label', 'image category label' ].

For example, if the voice category label [ 'text content label', 'sound emotion label' ], and the image category label [ 'image environment label', 'image hue label', 'image capture subject label' ], after integration, a target video category label with five attributes is created as follows: target video category label [ 'text content label', 'sound emotion label', 'image environment label', 'image tone label', 'image capture subject label' ].

S205: and determining the target music matched with the target category label from the candidate music library.

In this embodiment, S205 is the same as S103 in the previous embodiment, and please refer to the related description of S103 in the previous embodiment, which is not repeated herein.

Alternatively, when the number of the target music is at least two, the background music may be replaced when the music with the highest matching degree is dissatisfied with the user as the background music, and after S205, S206 to S207 may be further included. The method comprises the following specific steps:

s206: and playing the target music with the highest matching degree with the target video.

The music with the highest matching degree is the music which is automatically selected by the terminal for the user and is most matched with the target video, and the target music with the highest matching degree with the target video is played, so that the user can experience the matched background music.

S207: and if an instruction for indicating that the user changes the background music is received, popping up a target music list, wherein the target music list is used for the user to select the background music.

The terminal detects whether an instruction for indicating the user to change the background music is received or not, and if the instruction for indicating the user to change the background music is received, a target music list is popped up. The terminal can be provided with a button, if a user is not satisfied with the target music with the highest matching degree automatically played by the system, the user can click the button, a list of the target music can be popped up at the moment, the music in the list is the music with the matching degree higher than a certain preset threshold value after being matched, and the user can select the music in the popped-up list as the background music. When the music with the highest matching degree is dissatisfied by the user and is the most background music, the background music can be replaced, the replaced music lists are the music with the higher matching degree with the target video, much time can be saved when the user selects the music, and the efficiency is improved.

According to the scheme, the voice information and the image information of the target video to be processed are obtained; determining a voice category label of the voice information according to the voice information; determining an image category label of the image information according to the image information; integrating the voice category label and the image category label to obtain a target category label to which the target video belongs; and determining the target music matched with the target category label from the candidate music library. According to the embodiment of the invention, as the feature tags are extracted from the voice part and the image part, the generated video category tags have more comprehensive content, the content of the video can be reflected better, the matching degree of the matched background music and the video is high, a user does not need to select for many times, and the efficiency of setting the background music is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Referring to fig. 6, fig. 6 is a schematic diagram of an apparatus for matching background music according to an embodiment of the present invention. The units included are used to perform the steps in the embodiments corresponding to fig. 1-5. Please refer to the related description of the embodiments in fig. 1 to 5. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 6, the apparatus for matching background music includes:

a first obtaining unit 610, configured to obtain voice information and image information of a target video to be processed;

a second obtaining unit 620, configured to determine, based on the voice feature information and the image feature information, a target category tag to which the target video belongs;

a determining unit 630, configured to determine the target music matching the target category label from the candidate music library.

Further, the second obtaining unit 620 may include:

the first determining unit is used for determining a voice category label of the voice information according to the voice information;

a second determination unit that determines an image type label of the image information from the image information;

and the first integration unit is used for integrating the voice category label and the image category label to obtain a target category label to which the target video belongs.

Further, the first determining unit is specifically configured to:

analyzing the voice information, and determining a text content label and a sound emotion label of the voice information;

and integrating the text content label and the sound emotion label to obtain the voice category label.

Further, the image information includes: the second determining unit includes:

a third determination unit configured to determine an image environment tag of the image information according to the shooting environment information;

a fourth determination unit configured to determine an image tone label of the image information based on the luminance information and the color information;

a fifth determining unit configured to determine an image capturing subject label of the image information based on the personal feature information; wherein the image capturing subject label is used for identifying the behavior and emotion of a person;

and the second integration unit is used for integrating the image environment label, the image tone label and the image shooting subject label to obtain the image category label.

Further, the third determining unit is specifically configured to:

determining an image environment label of the image information according to the shooting environment information;

determining an image tone label of the image information according to the brightness information and the color information;

determining an image shooting subject label of the image information according to the person feature information; wherein the image capturing subject label is used for identifying the behavior and emotion of a person;

and integrating the image environment label, the image tone label and the image shooting subject label to obtain the image category label. Further, the third determining unit is specifically configured to:

acquiring tristimulus values of all pixel points contained in each frame of image in the target video;

extracting the tristimulus values with the highest occurrence frequency from the tristimulus values of all the pixel points;

and determining an image tone label according to the tristimulus values with the highest occurrence frequency.

Further, the person feature information includes body limb information and facial feature information, and the fifth determining unit is specifically configured to:

determining an image character limb label according to the human limb information in the character characteristic information;

determining image character expression labels according to the facial feature information in the character feature information;

and integrating the image character limb label and the image character expression label to obtain the image shooting subject label.

Further, the apparatus further comprises:

the playing unit is used for playing the target music with the highest matching degree with the target video;

and the processing unit is used for popping up a target music list if an instruction for indicating that the user changes the background music is received, wherein the target music list is used for the user to select the background music.

Fig. 7 is a schematic diagram of a terminal for matching background music according to an embodiment of the present invention. As shown in fig. 7, the terminal 7 for matching background music of this embodiment includes: a processor 710, a memory 720 and a computer program 730 stored in said memory 720 and executable on said processor 710, such as a program matching background music. The processor 710, when executing the computer program 730, implements the steps in the above-described embodiments of the method for matching background music, such as the steps 101 to 103 shown in fig. 1. Alternatively, the processor 710, when executing the computer program 730, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the units 610 to 630 shown in fig. 6.

Illustratively, the computer program 730 may be partitioned into one or more modules/units that are stored in the memory 720 and executed by the processor 710 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution process of the computer program 730 in the terminal 6 matching the background music. For example, the computer program 730 may be divided into a first acquiring unit, a second acquiring unit, and a determining unit, and each unit has the following specific functions:

The terminal 7 matched with the background music can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal matching the background music may include, but is not limited to, a processor 710 and a memory 720. It will be appreciated by those skilled in the art that fig. 7 is only an example of the terminal 7 matching the background music and does not constitute a limitation to the terminal 7 matching the background music, and may comprise more or less components than those shown, or some components may be combined, or different components, for example, the terminal matching the background music may further comprise an input-output device, a network access device, a bus, etc.

The Processor 710 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 720 may be an internal storage unit of the terminal 7 matching the background music, such as a hard disk or a memory of the terminal 7 matching the background music. The memory 720 may also be an external storage device of the terminal 7 matching the background music, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, which are equipped on the terminal 7 matching the background music. Further, the memory 720 may also include both an internal storage unit of the terminal 7 matching the background music and an external storage device. The memory 720 is used for storing the computer program and other programs and data required by the terminal matching the background music. The memory 720 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method of matching background music, comprising:

2. The method of matching background music according to claim 1, wherein said determining a target category label to which the target video belongs based on the voice information and the image information comprises:

determining a voice category label of the voice information according to the voice information;

determining an image category label of the image information according to the image information;

and integrating the voice category label and the image category label to obtain a target category label to which the target video belongs.

3. The method of matching background music as claimed in claim 2, wherein said determining a speech class label of said speech information based on said speech information comprises:

4. The method of matching background music of claim 2, wherein the image information comprises: shooting environment information, brightness information, color information and character characteristic information of a shot object;

the determining an image category label of the image information according to the image information includes:

and integrating the image environment label, the image tone label and the image shooting subject label to obtain the image category label.

5. The method for matching background music according to claim 4, wherein the determining the image tone label corresponding to the image information according to the brightness information and the color information comprises:

6. The method of matching background music of claim 4, wherein said personal characteristic information includes body extremity information and facial characteristic information;

the determining of the image capturing subject label of the image information according to the person feature information includes:

7. The method for matching background music according to any one of claims 1 to 6, wherein the number of the target music is at least two, and after determining the target music matching the target category label from the candidate music library, the method further comprises:

playing target music with the highest matching degree with the target video;

and if an instruction for indicating that the user changes the background music is received, popping up a target music list, wherein the target music list is used for the user to select the background music.

8. An apparatus for matching background music, comprising:

a second obtaining unit, configured to determine, based on the voice information and the image information, a target category tag to which the target video belongs;

9. A terminal for matching background music, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.