CN107368609B

CN107368609B - Obtain the method, apparatus and computer readable storage medium of multimedia file

Info

Publication number: CN107368609B
Application number: CN201710679015.9A
Authority: CN
Inventors: 张超钢; 黄美红; 陈文锋
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2017-08-10
Filing date: 2017-08-10
Publication date: 2018-09-04
Anticipated expiration: 2037-08-10
Also published as: CN107368609A

Abstract

The invention discloses a kind of method, apparatus and computer readable storage medium obtaining multimedia file, belong to network communication technology field.Method includes：The reference sequence of notes of the voice signal of acquisition is extracted, this includes multiple notes with reference to sequence of notes；For any multimedia file in multimedia file library, when the sequence of notes of any multimedia file has repetitive structure, the benchmark note subsequence of any multimedia file is obtained, the number for the note which includes at least one note and the benchmark note sub-series of packets includes is less than the number for the note that any multimedia file includes；The benchmark note subsequence that sequence of notes and any multimedia file are referred to according to this, determines the matching degree between the voice signal and any multimedia file；According to the matching degree between the voice signal and any multimedia file, the destination multimedia file that matching degree meets preset condition is obtained from multimedia file library.The present invention provides efficiency.

Description

Obtain the method, apparatus and computer readable storage medium of multimedia file

Technical field

The present invention relates to network communication technology field, more particularly to a kind of method, apparatus and meter obtaining multimedia file Calculation machine readable storage medium storing program for executing.

Background technology

Currently, most of terminal all supports music software, and most of music software all has the function of that song is listened to know song； When user does not know title of the song, user can groan out the melody for the song for wanting search against terminal, and terminal is bent by listening song to know Function, the corresponding song of the melody is searched out from multimedia server.

When terminal searches for the corresponding song of the melody from multimedia server, terminal acquires voice letter input by user Number, send the voice signal to multimedia server；Multimedia server receives the voice signal, extracts the sound of the voice signal High sequence calculates the matching degree between the pitch sequence of each song in the pitch sequence and library, according to the pitch sequence Matching degree between the pitch sequence of each song selects the highest song of matching degree from library, and being sent to terminal should The song of selection.

In the implementation of the present invention, the inventor finds that the existing technology has at least the following problems：

Since the duration of a song is generally at 4 minutes or so, the pitch sequence of a song includes more than 100 Pitch, multimedia server calculate between the pitch sequence of each song in the pitch sequence and library of the voice signal For matching degree than relatively time-consuming, the efficiency that song is obtained so as to cause terminal is low.

Invention content

In order to solve problems in the prior art, the present invention provides a kind of method, apparatus and meter obtaining multimedia file Calculation machine readable storage medium storing program for executing.Technical solution is as follows：

On the one hand, the present invention provides it is a kind of obtain multimedia file method, the method includes：

The reference sequence of notes of the voice signal of acquisition is extracted, the reference sequence of notes includes multiple notes；

For any multimedia file in multimedia file library, when the sequence of notes of any multimedia file has When repetitive structure, the benchmark note subsequence of any multimedia file is obtained, the benchmark note sub-series of packets includes at least One note, and the number of note that the benchmark note sub-series of packets includes is less than the note that any multimedia file includes Number；

According to described with reference to sequence of notes and the benchmark note subsequence of any multimedia file, the voice is determined Matching degree between signal and any multimedia file；

According to the matching degree between the voice signal and any multimedia file, from the multimedia file library Obtain the destination multimedia file that matching degree meets preset condition.

In one possible implementation, the benchmark note subsequence for obtaining any multimedia file it Before, the method further includes：

The sequence of notes of any multimedia file is divided into multiple note subsequences, each note subsequence includes At least one note；

Based on default multiplicity algorithm, the multiplicity between each note subsequence is determined；

If the multiplicity between each note subsequence is more than default multiplicity, any multimedia text is determined The sequence of notes of part has repetitive structure.

In one possible implementation, described based on default multiplicity algorithm, determine each note subsequence Between multiplicity, including：

Based on similar matrix algorithm, at least one similar matrix between each note subsequence is determined, according to every A similar matrix determines the characteristic value of each similar matrix, according to the characteristic value of each similar matrix, determine described in Multiplicity between each note subsequence；Alternatively,

Based on cross correlation algorithm, at least one cross correlation measure between each note subsequence is determined, according to each Cross correlation measure determines the multiplicity between each note subsequence；Alternatively,

Based on editing distance algorithm, determine that at least one editing distance between each note subsequence, root are each Editing distance determines the multiplicity between each note subsequence；Alternatively,

Based on EMD distance algorithms, at least one EMD distances between each note subsequence are determined, according to each EMD distances determine the multiplicity between each note subsequence.

In one possible implementation, the benchmark note subsequence for obtaining any multimedia file, packet It includes：

A note subsequence is randomly choosed from the multiple note subsequence as any multimedia file Benchmark note subsequence；Alternatively,

It includes the most note subsequence of note number as described any that one is selected from the multiple note subsequence The benchmark note subsequence of multimedia file；Alternatively,

It includes the minimum note subsequence of note number as described any that one is selected from the multiple note subsequence The benchmark note subsequence of multimedia file.

In one possible implementation, the intersection between two neighboring note subsequence includes preset number sound Symbol, the preset number are more than or equal to 0, and less than the integer of specified numerical value, the specified numerical value is described any more Media file includes the quotient of the number of note and the number of the note subsequence of division.

In one possible implementation, the note includes pitch and/or the duration of a sound, and the pitch is the note Relative pitch between perfect pitch or two neighboring note.

On the other hand, the present invention provides a kind of device obtaining multimedia file, described device includes：

Extraction module, the reference sequence of notes of the voice signal for extracting acquisition, the reference sequence of notes includes more A note；

First acquisition module is used for for any multimedia file in multimedia file library, when any multimedia When the sequence of notes of file has repetitive structure, the benchmark note subsequence of any multimedia file, the benchmark are obtained Note subsequence includes at least one note, and the number of note that includes of the benchmark note sub-series of packets be less than it is described any more The number for the note that media file includes；

Determining module, for according to described with reference to sequence of notes and the sub- sequence of benchmark note of any multimedia file Row, determine the matching degree between the voice signal and any multimedia file；

Second acquisition module, for according to the matching degree between the voice signal and any multimedia file, from The destination multimedia file that matching degree meets preset condition is obtained in the multimedia file library.

In one possible implementation, described device further includes：

Division module, for the sequence of notes of any multimedia file to be divided into multiple note subsequences, each Note subsequence includes at least one note；

The determining module is additionally operable to, based on default multiplicity algorithm, determine the weight between each note subsequence Multiplicity；

The determining module, if the multiplicity being additionally operable between each note subsequence is more than default multiplicity, Determine that the sequence of notes of any multimedia file has repetitive structure.

In one possible implementation, the determining module is additionally operable to be based on similar matrix algorithm, determines described every At least one similar matrix between a note subsequence determines the spy of each similar matrix according to each similar matrix Value indicative determines the multiplicity between each note subsequence according to the characteristic value of each similar matrix；Alternatively,

The determining module is additionally operable to be based on cross correlation algorithm, determines at least one between each note subsequence A cross correlation measure determines the multiplicity between each note subsequence according to each cross correlation measure；Alternatively,

The determining module is additionally operable to be based on editing distance algorithm, determine between each note subsequence at least One editing distance, each editing distance of root determine the multiplicity between each note subsequence；Alternatively,

The determining module is additionally operable to be based on EMD distance algorithms, determines at least one between each note subsequence A EMD distances determine the multiplicity between each note subsequence according to each EMD distances.

In one possible implementation, first acquisition module is additionally operable to from the multiple note subsequence Randomly choose benchmark note subsequence of the note subsequence as any multimedia file；Alternatively,

First acquisition module is additionally operable to select one from the multiple note subsequence to include that note number is most Benchmark note subsequence of the note subsequence as any multimedia file；Alternatively,

First acquisition module is additionally operable to select one from the multiple note subsequence to include that note number is minimum Benchmark note subsequence of the note subsequence as any multimedia file.

On the other hand, the present invention provides a kind of device obtaining multimedia file, described device includes：It processor and deposits Reservoir is stored at least one instruction in the memory, and described instruction is loaded by the processor and executed to realize first Aspect any one of them method.

On the other hand, the present invention provides a kind of computer readable storage mediums, which is characterized in that described computer-readable At least one instruction is stored in storage medium, described instruction is loaded by processor and executed to realize any one of first aspect institute The method stated.

The advantageous effect that technical solution provided in an embodiment of the present invention is brought is：There is repetitive structure for sequence of notes Multimedia file obtains the benchmark note subsequence of the multimedia file, more with this according to the reference sequence of notes of voice signal The benchmark note subsequence of media file, determines the matching degree between the voice signal and the multimedia file, is based on matching degree, The destination multimedia file that matching degree meets preset condition is obtained from multimedia file library.Due to the benchmark of the multimedia file The number for the note that note subsequence includes is less than the number for the note that the multimedia file includes, therefore in the embodiment of the present invention In, according to the benchmark note subsequence of the reference sequence of notes of voice signal and the multimedia file, determine the voice signal with Matching degree between the multimedia file reduces and calculates the time, improves the efficiency for obtaining multimedia file.

Description of the drawings

Fig. 1 is a kind of schematic diagram of implementation environment provided in an embodiment of the present invention；

Fig. 2 is a kind of method flow diagram obtaining multimedia file provided in an embodiment of the present invention；

Fig. 3 is a kind of method flow diagram obtaining multimedia file provided in an embodiment of the present invention；

Fig. 4 is a kind of method flow diagram obtaining multimedia file provided in an embodiment of the present invention；

Fig. 5 is a kind of apparatus structure schematic diagram obtaining multimedia file provided in an embodiment of the present invention；

Fig. 6 is a kind of block diagram of multimedia server provided in an embodiment of the present invention.

Specific implementation mode

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

In the prior art, multimedia server is in the voice signal based on acquisition, when recommending multimedia file for user, Multimedia server calculates the sound of the pitch sequence and each multimedia file in multimedia file library of the voice signal of the acquisition Matching degree between high sequence, the pitch sequence of the pitch sequence of the voice signal based on the acquisition and each multimedia file it Between matching degree, for user recommend multimedia file.However since the pitch that the pitch sequence of multimedia file includes is more, because This multimedia file calculates between the pitch sequence of the voice signal of the acquisition and the pitch sequence of each multimedia file It is low so as to cause the efficiency for obtaining multimedia file with degree than relatively time-consuming.

In order to improve the efficiency for obtaining multimedia file；In embodiments of the present invention, have for sequence of notes and repeat to tie The multimedia file of structure, the extraction unit dieresis sequence from the multimedia file for ease of description claim part sequence of notes On the basis of note subsequence, between the reference sequence of notes for directly calculating the voice signal of the benchmark note subsequence and acquisition Matching degree, the number of the note included due to benchmark note sub-series of packets are less than the number for the note that the multimedia file includes, from And reduce and calculate the time, improve the efficiency for obtaining multimedia file.

Fig. 1 is a kind of implementation environment provided in an embodiment of the present invention, and referring to Fig. 1, which includes terminal 101 and more Media server 102.It is connected by communication network between terminal 101 and multimedia server 102.

Wherein, 102 associated application of multimedia server is run in terminal 101, can be taken with multimedia by the application It is interacted between business device 102.For example, terminal 101 logs in the application based on user identifier or directly logs in the application, to It is interacted with multimedia server 102.The application can be a variety of applications such as voice applications or Video Applications.The user marks It is user account, telephone number etc. to know, and it is not limited in the embodiment of the present invention.

Terminal 101 can be mobile phone terminal 101, PAD (portable android device, tablet computer) terminal 101 Or computer terminal 101 etc..Multimedia server 102 can be a multimedia server 102, or by several multimedias 102 cluster of multimedia server or 102 center of cloud computing multimedia server that server 102 forms, the disclosure Embodiment does not limit this；Multimedia server 102 can be video server or audio server.

An embodiment of the present invention provides a kind of methods obtaining multimedia file, in this method application of multimedia server, Referring to Fig. 2, this method includes：

Step 201：The reference sequence of notes of the voice signal of acquisition is extracted, this includes multiple notes with reference to sequence of notes.

Step 202：For any multimedia file in multimedia file library, when the note sequence of any multimedia file Row have repetitive structure when, obtain the benchmark note subsequence of any multimedia file, the benchmark note sub-series of packets include to A few note, and the number of note that the benchmark note sub-series of packets includes is less than the note that any multimedia file includes Number.

Step 203：The benchmark note subsequence that sequence of notes and any multimedia file are referred to according to this, determines the language Matching degree between sound signal and any multimedia file.

Step 204：According to the matching degree between the voice signal and any multimedia file, from the multimedia file library The middle destination multimedia file for obtaining matching degree and meeting preset condition.

It in one possible implementation, should before the benchmark note subsequence of the acquisition any multimedia file Method further includes：

The sequence of notes of any multimedia file is divided into multiple note subsequences, each note subsequence includes extremely A few note；

If the multiplicity between each note subsequence is more than default multiplicity, any multimedia file is determined Sequence of notes has repetitive structure.

In one possible implementation, it should be determined between each note subsequence based on default multiplicity algorithm Multiplicity, including：

Based on similar matrix algorithm, at least one similar matrix between each note subsequence is determined, according to each Similar matrix determines the characteristic value of each similar matrix, according to the characteristic value of each similar matrix, determines each note Multiplicity between subsequence；Alternatively,

Based on cross correlation algorithm, at least one cross correlation measure between each note subsequence is determined, according to each mutual The degree of correlation determines the multiplicity between each note subsequence；Alternatively,

Based on editing distance algorithm, determine that at least one editing distance between each note subsequence, root are each compiled Distance is collected, determines the multiplicity between each note subsequence；Alternatively,

Based on EMD distance algorithms, at least one EMD distances between each note subsequence are determined, according to each EMD Distance determines the multiplicity between each note subsequence.

In one possible implementation, the benchmark note subsequence of the acquisition any multimedia file, including：

Benchmark of the note subsequence as any multimedia file is randomly choosed from multiple note subsequence Note subsequence；Alternatively,

It includes the most note subsequence of note number as any more matchmakers that one is selected from multiple note subsequence The benchmark note subsequence of body file；Alternatively,

It includes the minimum note subsequence of note number as any more matchmakers that one is selected from multiple note subsequence The benchmark note subsequence of body file.

In one possible implementation, the intersection between two neighboring note subsequence includes preset number sound Symbol, the preset number are more than or equal to 0, and less than the integer of specified numerical value, which is any multimedia text Part includes the quotient of the number of note and the number of the note subsequence of division.

In one possible implementation, which includes pitch and/or the duration of a sound, which is the absolute sound of the note Relative pitch between high or two neighboring note.

In embodiments of the present invention, the multimedia file for sequence of notes with repetitive structure obtains multimedia text The benchmark note subsequence of part, according to the benchmark note subsequence of the reference sequence of notes of voice signal and the multimedia file, It determines the matching degree between the voice signal and the multimedia file, is based on matching degree, matching is obtained from multimedia file library Degree meets the destination multimedia file of preset condition.Due to the number for the note that the benchmark note sub-series of packets of the multimedia file includes Mesh is less than the number for the note that the multimedia file includes, therefore in embodiments of the present invention, according to the reference sound of voice signal The benchmark note subsequence for according with sequence and the multimedia file, determines the matching between the voice signal and the multimedia file Degree reduces and calculates the time, improves the efficiency for obtaining multimedia file.

Before obtaining multimedia file, multimedia server is it needs to be determined that multimedia file in multimedia file library Whether sequence of notes has repetitive structure；If the sequence of notes of multimedia file has repetitive structure, just according to of the invention real The method for applying example offer obtains multimedia file.Referring to Fig. 3, this method includes：

Step 301：For any multimedia file in multimedia file library, multimedia server is by any multimedia The sequence of notes of file is divided into multiple note subsequences, and each note subsequence includes at least one note.

Sequence of notes includes pitch and/or the duration of a sound；Pitch can be perfect pitch, or between two neighboring pitch Relative pitch.Correspondingly, each note subsequence includes the duration of a sound of at least one pitch and/or each pitch.Multimedia takes The number for multiple note subsequences that business device divides the sequence of notes of any multimedia file can be any number more than 2 Value；The number of notes that each note subsequence includes can be identical, can not also be identical.Also, two neighboring note subsequence Between intersection can be with preset number note.Wherein, preset number be more than or equal to 0, and it is whole less than specified numerical value Number.Specified numerical value is the quotient that any multimedia file includes the number of note and the number of the note subsequence of division.It is multiple The union of note subsequence is equal to the sequence of notes of any multimedia file.

It should be noted that in order to improve accuracy, the intersection between two neighboring note subsequence is not empty set, namely Two neighboring note subsequence includes several identical notes.For example, the sequence of notes of any multimedia file includes N number of sound Height, then sequence of notes M=[m₁ m₂ m₃ …… m_N-1 m_N]；The sequence of notes is divided into two notes by multimedia server Subsequence, respectively X1 and X2, and X1=[m₁ m₂ m₃ …… m_N/2+K-1 m_N/2+K], X2=[m_N/2-K m_N/2-K+1 m_N/2-K+2 …… m_N-1 m_N].Two sequence of notes include identical K note, the value range of K be [0, N/2).

Step 302：Multimedia server is based on default multiplicity algorithm, determines the repetition between each note subsequence Degree.

The default multiplicity algorithm can be similar matrix algorithm, cross correlation algorithm, editing distance algorithm or EMD distances Algorithm etc..When default multiplicity algorithm is similar matrix algorithm, this step can be realized by following first way.When default Multiplicity algorithm is cross correlation algorithm, this step can be realized by the following second way.When default multiplicity algorithm is to compile Distance algorithm is collected, this step can be realized by the third following mode.When default multiplicity algorithm is EMD (earth Mover's distance) distance algorithm, this step can be realized by following 4th kind of mode.

For the first realization method, this step can be realized by following steps (1) to (3), including：

(1)：Multimedia server is based on similar matrix algorithm, determines at least one similar between each note subsequence Matrix.

Multimedia server determines that at least one set of note subsequence, every group of note subsequence include two note subsequences, The similar matrix between every group of note subsequence is calculated by similar matrix algorithm, obtains at least one similar matrix.

Wherein, for one group of note subsequence, multimedia server calculates this group of note subsequence by following formula one Between similar matrix.

Formula one：

Wherein, X_mAnd X_nTwo note subsequences that respectively one group of note subsequence includes.x_miFor note subsequence X_m In i-th of note, x_njFor note subsequence X_nIn j-th of note.c_mn[i] [j] is note subsequence X_mAnd X_nBetween Similar matrix.

Wherein, for multimedia server when determining at least one set of note subsequence, multimedia server can be by adjacent two A note subsequence is determined as one group of note subsequence, or any two note subsequence is determined as one group of sub- sequence of note Row.

For example, any multimedia file is divided into two note subsequences, respectively note by multimedia server Sequence X 1 and note subsequence X2；And note subsequence X1 and note subsequence X2 only include pitch.Wherein, note subsequence X1=[52 53 54 55 56 57 58], note subsequence X2=[50 51 52 53 54 55 50 57 58].

Correspondingly, similar matrix algorithm is

It can show that the similar matrix between note subsequence X1 and X2 is based on the similar matrix algorithm：

(2)：Multimedia server determines the characteristic value of each similar matrix according to each similar matrix.

In this step, for each similar matrix, note can be repeated the length of longest segment by multimedia server Characteristic value as the similar matrix.Multimedia server can also regard the sum of length of note repeated fragment as the similar square The characteristic value of battle array.Multimedia server can also by the maximum length of the sum of length of note repeated fragment on multiple diagonal lines it With the characteristic value as the similar matrix.

It should be noted that the number of the continuous numerical value for not being 0 is that the note repeats on diagonal line in the similar matrix The length of segment.

For example, when note is repeated the length of longest segment as when the characteristic value of the similar matrix by multimedia server, Then note repeated fragment is [1 23 4], [1 2], [1 1] in the above similar matrix.The length of note repeated fragment is respectively 4,2 and 2, then multimedia server determine the similar matrix characteristic value be 4.

For another example, when multimedia server is by characteristic value of the sum of the length of note repeated fragment as the similar matrix, The sum of length of note repeated fragment of the above similar matrix is 4+2+2=8；Then multimedia server determines the similar matrix Characteristic value is 8.

For another example, when multimedia server is by the sum of the maximum length of the sum of length of note repeated fragment on multiple diagonal lines When characteristic value as the similar matrix, the sum of length of note repeated fragment on two above diagonal line is respectively 4+2=6 And 2, then multimedia server determine the similar matrix characteristic value be 6.

(3)：Multimedia server determines the repetition between each note subsequence according to the characteristic value of each similar matrix Degree.

Multimedia server selects minimal eigenvalue from the characteristic value of each similar matrix, using the minimal eigenvalue as Multiplicity between each note subsequence.

It should be noted that multimedia server can also select maximum feature from the characteristic value of each similar matrix Value, using the maximum eigenvalue as the multiplicity between each note subsequence.Alternatively, multimedia server is to each similar square The characteristic value of battle array is weighted, and obtains the multiplicity between each note subsequence.

For second of realization method, this step can be：

Multimedia server is based on cross correlation algorithm, determines at least one cross correlation measure between each note subsequence, The multiplicity between each note subsequence is determined according to each cross correlation measure.

Multimedia server determines that at least one set of note subsequence, every group of note subsequence include two note subsequences, The cross correlation measure between every group of note subsequence is calculated by cross correlation algorithm, obtains at least one cross correlation measure；From it is each mutually Minimum cross correlation measure is selected in the degree of correlation, which is determined as the multiplicity between each note subsequence.

It should be noted that multimedia server can also select maximum cross correlation measure from multiple cross correlation measures, by this Maximum cross correlation measure is determined as the multiplicity between each note subsequence.Alternatively, multimedia server is to each cross correlation measure It is weighted, obtains the multiplicity between each note subsequence.

Wherein, for one group of note subsequence, multimedia server calculates this group of note subsequence by following formula two Between cross correlation measure.

Formula two：

Wherein, X_mAnd X_nTwo note subsequences that respectively one group of note subsequence includes.x_m(j) it is the sub- sequence of note Arrange X_mIn j-th of note, y_n(j-i) it is note subsequence X_nIn jth-i notes, c_mn(i, j) is note subsequence X_mWith X_nBetween cross correlation measure.

Equally, for multimedia server when determining at least one set of note subsequence, multimedia server can be by adjacent two A note subsequence is determined as one group of note subsequence, or any two note subsequence is determined as one group of sub- sequence of note Row.

For the third realization method, this step can be：

Multimedia server is based on editing distance algorithm, determine at least one editor between each note subsequence away from From determining the multiplicity between each note subsequence according to each editing distance.

Multimedia server determines that at least one set of note subsequence, every group of note subsequence include two note subsequences, By editing distance algorithm, the editing distance between every group of note subsequence is calculated, obtains at least one editing distance；From each Smallest edit distance is selected in editing distance, which is determined as the multiplicity between each note subsequence.

It should be noted that multimedia server can also select maximum editing distance from multiple editing distances, by this Maximum editing distance is determined as the multiplicity between each note subsequence.Alternatively, multimedia server is to each editing distance It is weighted, obtains the multiplicity between each note subsequence.

Wherein, for one group of note subsequence, multimedia server calculates this group of note subsequence by following formula three Between editing distance.

Formula three：

Wherein, X_mAnd X_nTwo note subsequences that respectively one group of note subsequence includes.c_mn[i] [j] is two Note subsequence X_mAnd X_nBetween editing distance, i be note subsequence X_mIn note number, j be note subsequence X_nIn Note number.A, b and c is respectively weighting coefficient.And a, b and c can be configured and change as needed, in this hair In bright embodiment, a, b and c are not especially limited.Also, a, the magnitude relationship between b and c can also be arbitrarily arranged.For Raising accuracy, generally takes a>B, c>Magnitude relationship between b, a and c is not construed as limiting.

For the 4th kind of realization method, this step can be：

Multimedia server is based on EMD distance algorithms, determines at least one EMD distances between each note subsequence, The multiplicity between each note subsequence is determined according to each EMD distances.

Multimedia server determines that at least one set of note subsequence, every group of note subsequence include two note subsequences, By EMD distance algorithms, the EMD distances between every group of note subsequence are calculated, obtain at least one EMD distances；From each EMD Minimum EMD distances are selected in distance, and minimum EMD distances are determined as the multiplicity between each note subsequence.

It should be noted that multimedia server can also select maximum EMD distances from multiple EMD distances, most by this Big EMD distances are determined as the multiplicity between each note subsequence.Alternatively, multimedia server carries out each EMD distances Ranking operation obtains the multiplicity between each note subsequence.

It should be noted that default multiplicity algorithm can also be longest common subsequence or Dynamic Time Scaling, Earth Mover's Distance etc..Also, multimedia server determines the weight between each note subsequence When multiplicity, the repetition between each note subsequence can be determined in conjunction with one or more of above four kinds of realization methods Degree.When the multiplicity between each note subsequence of a variety of determinations in the above four kinds of realization methods of combination, it is based on each reality The multiplicity that existing mode obtains is weighted, and obtains the multiplicity between each note subsequence.

For example, multimedia server combines the first realization method and second of realization method, the sub- sequence of each note is determined Multiplicity between row, then multimedia server be based on similar matrix algorithm, determine the similar square between each note subsequence Battle array, according to the similar matrix, determines the characteristic value of the similar matrix；Multimedia server is based on cross correlation algorithm, determines each Cross correlation measure between note subsequence, the cross correlation measure between characteristic value and each note subsequence to the similar matrix It is weighted, obtains the multiplicity between each note subsequence.

Step 303：Multimedia server determines whether the multiplicity between each note subsequence is more than default multiplicity, If the multiplicity is more than default multiplicity, determine that the sequence of notes of any multimedia file has repetitive structure.

If the multiplicity is not more than default multiplicity, determines that the sequence of notes of any multimedia file does not have and repeat Structure.Wherein, default multiplicity can be configured and change as needed, in embodiments of the present invention, to presetting multiplicity It is not especially limited；For example, default multiplicity can be 8 or 5 etc..

Step 304：Multimedia server selects a note from multiple note subsequences of any multimedia file Benchmark note subsequence of the subsequence as any multimedia file.

In this step, multimedia server is in the benchmark note subsequence for determining any multimedia file, in order to It improves and determines efficiency, a note subsequence can be randomly choosed from multiple note subsequences of any multimedia file and is made For the benchmark note subsequence of any multimedia file.

In order to improve the follow-up accuracy for obtaining multimedia file, multimedia server can be from multiple note subsequence It is middle to select one to include benchmark note subsequence of the most note subsequence of note number as any multimedia file.

In order to improve the follow-up efficiency for obtaining multimedia file, multimedia server can be from multiple note subsequence It includes benchmark note subsequence of the minimum note subsequence of note number as any multimedia file to select one.

Step 305：Multimedia server binds the base of the mark and any multimedia file of any multimedia file Correspondence between quasi- note subsequence.

Multimedia server binds the mark of any multimedia file and benchmark note of any multimedia file Correspondence between sequence, when in order to subsequent multimedia server search multimedia file, from the mark of multimedia file Benchmark note subsequence with multimedia file is obtained in the correspondence of benchmark note subsequence, is based on benchmark note subsequence It is retrieved.

It should be noted that multimedia server by each multimedia file in multimedia file library by walking above Rapid 301-304 determines the benchmark note subsequence of each multimedia file.Do not have the more of repetitive structure for sequence of notes Media file, multimedia server bind the correspondence between the mark of the multimedia file and the sequence of notes of the multimedia file Relationship.

In embodiments of the present invention, before obtaining multimedia file, multimedia server determines the note of multimedia file Whether sequence has repetitive structure；If having repetitive structure, the mark and the multimedia file of the multimedia file are bound Correspondence between benchmark note subsequence, when in order to subsequent multimedia server search multimedia file, from multimedia The benchmark note subsequence that multimedia file is obtained in the mark of file and the correspondence of benchmark note subsequence, is based on benchmark Note subsequence is retrieved, to improve the follow-up efficiency for obtaining multimedia file.

An embodiment of the present invention provides a kind of method obtaining multimedia file, this method is applied in terminal and multimedia clothes It is engaged between device, referring to Fig. 4, this method includes：

Step 401：Terminal obtains the voice signal of acquisition, is sent to multimedia server and obtains request, acquisition request Carry the voice signal.

The current interface of terminal includes that song is listened to know bent recognition button, when user's searching multimedia files, Yong Huke To click the recognition button；When terminal detects that the recognition button is triggered, terminal acquisition is input by user or other set The standby voice signal played sends to multimedia server and obtains request, and acquisition request carries the voice signal.

Step 402：Multimedia server receives the acquisition request that terminal is sent, and extracts the reference note of the voice signal Sequence.

Wherein, include multiple notes with reference to sequence of notes, which can only include pitch, can also only include the duration of a sound, It can also both include pitch, and also include the duration of a sound.The pitch can be the perfect pitch of the note, or two neighboring note Pitch between relative pitch.

Step 403：For the multimedia file with repetitive structure in multimedia file library, multimedia server obtains should The benchmark note subsequence of multimedia file.

Multi-media tag library is stored in multimedia server, which includes sequence of notes, and there is repetition to tie The mark of the multimedia file of structure.It is more with repetitive structure that multimedia server obtains sequence of notes from multi-media tag library The mark of media file has the mark of the multimedia file of repetitive structure according to sequence of notes, from the mark of multimedia file Have the benchmark note of the multimedia file of repetitive structure with sequence of notes is obtained in the correspondence of benchmark note subsequence Sequence.

For sequence of notes in multimedia file library do not have repetitive structure multimedia file, multimedia server according to The mark of the multimedia file, from obtaining the multimedia file in the correspondence of the mark of multimedia file and sequence of notes Sequence of notes.It should be noted that benchmark note sub-series of packets includes at least one note, and the reference note of some multimedia file The number for the note that the sequence of notes that the number for the note that symbol subsequence includes is less than the multimedia file includes.For example, some The sequence of notes of multimedia file include 8 notes, then the benchmark note subsequence of the multimedia file may only include 4 or 5 notes of person.

Step 404：Multimedia server refers to the benchmark note subsequence of sequence of notes and the multimedia file according to this, Determine the matching degree between the voice signal and the multimedia file.

Multimedia server calculates the reference by existing any algorithm for calculating the matching degree between sequence of notes Matching degree between sequence of notes and the benchmark note subsequence of the multimedia file.For example, including between two sequence of notes The number of identical note is as the matching degree between two sequence of notes.Then this step can be：

It is identical that multimedia server determines that the benchmark note sub-series of packets with reference to sequence of notes and the multimedia file includes The number is determined as this with reference to the matching degree between sequence of notes and the multimedia file by the number of note.

For sequence of notes in multimedia file library do not have repetitive structure multimedia file, multimedia server according to This refers to the sequence of notes of sequence of notes and the multimedia file, determines the matching between the voice signal and the multimedia file Degree.

Step 405：Multimedia server according to each multimedia file in the voice signal and multimedia file library it Between matching degree, select matching degree to meet the destination multimedia file of preset condition from multimedia file library.

Preset condition can be that matching degree is maximum or selection matching degree is more than preset matching degree.Wherein, preset matching degree It can be configured and change as needed, in embodiments of the present invention, preset matching degree is not especially limited.For example, pre- If matching degree can be 10 or 20 etc..

For example, when preset condition is that matching degree is maximum, then this step can be：

Multimedia server is according to the matching between each multimedia file in the voice signal and multimedia file library Degree selects the maximum preset number destination multimedia file of matching degree from multimedia file library.

Preset number can be configured and change as needed, in embodiments of the present invention, not make to have to preset number Body limits.For example, preset number can be 3 or 5 etc..

For another example, when preset condition is that matching degree is more than preset matching degree, then this step can be：

Multimedia server is according to the matching between each multimedia file in the voice signal and multimedia file library Degree, it is more than the destination multimedia file of preset matching degree that matching degree is selected from multimedia file library.

Step 406：Multimedia server sends destination multimedia file to terminal.

Terminal sent to multimedia server acquisition request in carried terminal terminal iidentification, multimedia server from this It obtains in request and obtains the terminal iidentification, according to the terminal iidentification, the destination multimedia file is sent to terminal.

In a possible realization method, in order to reduce the network resource consumption of terminal, multimedia server can not The destination multimedia file is sent to terminal, the mark of the destination multimedia file is only sent to terminal, is receiving terminal hair When the download request sent or playing request, just the destination multimedia file is sent to terminal.

Wherein, the mark of the terminal iidentification and the destination multimedia file can be configured and change as needed, In the embodiment of the present invention, the mark of the terminal iidentification and the destination multimedia file is not especially limited；For example, the terminal mark Knowledge can be the phone number of terminal or log in the user identifier of the application.The mark of the destination multimedia file can be should Title or number of destination multimedia file etc..

It should be noted that if there is no literary with the matched destination multimedia of the voice signal in multimedia file library Part unsuccessfully indicates that this unsuccessfully indicates to be used to indicate recognition failures to terminal transmission.Terminal receives the mistake that multimedia server is sent Instruction is lost, shows that this is unsuccessfully indicated.Wherein, terminal receives after this unsuccessfully indicates, terminal can also resurvey voice signal, It is sent again to multimedia server and obtains request, acquisition request carries the voice signal resurveyed.Multimedia service Device receives acquisition request, based on the voice signal resurveyed, is obtained and the voice resurveyed by above step The destination multimedia file of Signal Matching.

Step 407：Terminal receives the destination multimedia file that multimedia server is sent.

Terminal receives the destination multimedia file that multimedia server is sent, and stores the destination multimedia file, shows The mark of the destination multimedia file, user can click the destination multimedia file and play the destination multimedia with triggering terminal File；When terminal detects that the destination multimedia file is triggered, the stored destination multimedia file is obtained, the mesh is played Mark multimedia file.

It should be noted that if multimedia server only sends the destination multimedia file to terminal in a step 406 Mark, then this step can be：

Terminal receives the mark for the destination multimedia file that multimedia server is sent, and shows the destination multimedia file Mark；User can click the mark of the destination multimedia file and play the destination multimedia file with triggering terminal；Terminal When detecting that the destination multimedia file is triggered, playing request is sent to multimedia server, which carries the mesh Mark the mark of multimedia file.

Multimedia server receives the playing request that terminal is sent, and according to the mark of the destination multimedia file, obtaining should Destination multimedia file sends the destination multimedia file to terminal；It is more that terminal receives the target that multimedia server is sent Media file plays the destination multimedia file.

In addition, the method provided in an embodiment of the present invention for obtaining multimedia file can also be applied in the terminal.If should The method of multimedia file is obtained using in the terminal, then multimedia file library includes multiple multimedias that terminal local has been downloaded File.Also, the executive agent of above step 301-305 is terminal；Also, after terminal collects voice signal, it is not required to It to be sent to multimedia server and obtain request, directly extract the reference sequence of notes of the voice signal；For multimedia file The multimedia file with repetitive structure, multimedia server obtain the benchmark note subsequence of the multimedia file, root in library The benchmark note subsequence that sequence of notes and the multimedia file are referred to according to this, determine the voice signal and the multimedia file it Between matching degree, according to the matching degree between each multimedia file in the voice signal and multimedia file library, from more matchmakers It selects matching degree to meet the destination multimedia file of preset condition in body library, shows the destination multimedia file of acquisition.

An embodiment of the present invention provides a kind of devices obtaining multimedia file, and referring to Fig. 5, which includes：

Extraction module 501, the reference sequence of notes of the voice signal for extracting acquisition, this includes more with reference to sequence of notes A note；

First acquisition module 502 is used for for any multimedia file in multimedia file library, when any multimedia When the sequence of notes of file has repetitive structure, the benchmark note subsequence of any multimedia file, the benchmark note are obtained Subsequence includes at least one note, and the number of note that the benchmark note sub-series of packets includes is less than any multimedia file Including note number；

Determining module 503, the benchmark note subsequence for referring to sequence of notes and any multimedia file according to this, Determine the matching degree between the voice signal and any multimedia file；

Second acquisition module 504 is used for according to the matching degree between the voice signal and any multimedia file, from this The destination multimedia file that matching degree meets preset condition is obtained in multimedia file library.

In a kind of possible realization method, which further includes：

Division module, for the sequence of notes of any multimedia file to be divided into multiple note subsequences, Mei Geyin It includes at least one note to accord with subsequence；

The determining module 503 is additionally operable to, based on default multiplicity algorithm, determine the repetition between each note subsequence Degree；

The determining module 503, if the multiplicity being additionally operable between each note subsequence is more than default multiplicity, really The sequence of notes of fixed any multimedia file has repetitive structure.

In a kind of possible realization method, which is additionally operable to be based on similar matrix algorithm, determines that this is each At least one similar matrix between note subsequence determines the characteristic value of each similar matrix according to each similar matrix, According to the characteristic value of each similar matrix, the multiplicity between each note subsequence is determined；Alternatively,

The determining module 503 is additionally operable to be based on cross correlation algorithm, determines at least one between each note subsequence Cross correlation measure determines the multiplicity between each note subsequence according to each cross correlation measure；Alternatively,

The determining module 503 is additionally operable to be based on editing distance algorithm, determines at least one between each note subsequence A editing distance, each editing distance of root determine the multiplicity between each note subsequence；

The determining module 503 is additionally operable to be based on EMD distance algorithms, determines at least one between each note subsequence A EMD distances determine the multiplicity between each note subsequence according to each EMD distances.

In a kind of possible realization method, which is additionally operable to from multiple note subsequence random Select benchmark note subsequence of the note subsequence as any multimedia file；Alternatively,

First acquisition module 502 is additionally operable to select one from multiple note subsequence to include that note number is most Benchmark note subsequence of the note subsequence as any multimedia file；Alternatively,

First acquisition module 502 is additionally operable to select one from multiple note subsequence to include that note number is minimum Benchmark note subsequence of the note subsequence as any multimedia file.

In a kind of possible realization method, the intersection between two neighboring note subsequence includes preset number note, The preset number is more than or equal to 0, and less than the integer of specified numerical value, which is any multimedia file packet Include the quotient of the number of note and the number of the note subsequence of division.

In a kind of possible realization method, which includes pitch and/or the duration of a sound, which is the perfect pitch of the note Or the relative pitch between two neighboring note.

It should be noted that：Above-described embodiment provide acquisition multimedia file device when obtaining multimedia file, It only the example of the division of the above functional modules, can be as needed and by above-mentioned function distribution in practical application It is completed by different function modules, i.e., the internal structure of device is divided into different function modules, it is described above to complete All or part of function.In addition, the device for the acquisition multimedia file that above-described embodiment provides and acquisition multimedia file Embodiment of the method belongs to same design, and specific implementation process refers to embodiment of the method, and which is not described herein again.

Fig. 6 is a kind of block diagram of multimedia server provided in an embodiment of the present invention.Referring to Fig. 6, multimedia server 600 Including processing component 622, further comprise one or more processors, and provided by the memory representated by memory 632 Source, can be by the instruction of the execution of processing component 622, such as application program for storing.The application program stored in memory 632 May include it is one or more each correspond to one group of instruction module.In addition, processing component 622 is configured as holding Row instruction, the method to execute above-mentioned acquisition multimedia file.

Multimedia server 600 can also include that a power supply module 626 is configured as executing multimedia server 600 Power management, a wired or wireless network interface 650 are configured as multimedia server 600 being connected to network and one Input and output (I/O) interface 658.Multimedia server 600 can be operated based on the operating system for being stored in memory 632, example Such as Windows Server^TM, Mac OS X^TM, Unix^TM,Linux^TM, FreeBSD^TMOr it is similar.

In the exemplary embodiment, it includes the computer readable storage medium instructed to additionally provide a kind of, such as including referring to The memory of order, above-metioned instruction can be executed by the processor in terminal to complete the acquisition multimedia file in above-described embodiment Method.For example, computer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and Optical data storage devices etc..

One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims

1. a kind of method obtaining multimedia file, which is characterized in that the method includes：

For any multimedia file in multimedia file library, the sequence of notes of any multimedia file is divided into more A note subsequence, each note subsequence include at least one note, and the intersection between two neighboring note subsequence includes Preset number note, the preset number are more than or equal to 0, and less than the integer of specified numerical value, the specified numerical value Include the quotient of the number of note and the number of the note subsequence of division for any multimedia file；

If the multiplicity between each note subsequence is more than default multiplicity, any multimedia file is determined Sequence of notes has repetitive structure；

When the sequence of notes of any multimedia file has repetitive structure, the benchmark of any multimedia file is obtained Note subsequence, the benchmark note sub-series of packets include at least one note, and the note that the benchmark note sub-series of packets includes Number be less than the number of any multimedia file note for including；

According to described with reference to sequence of notes and the benchmark note subsequence of any multimedia file, the voice signal is determined With the matching degree between any multimedia file；

According to the matching degree between the voice signal and any multimedia file, obtained from the multimedia file library Matching degree meets the destination multimedia file of preset condition.

2. according to the method described in claim 1, it is characterized in that, described based on default multiplicity algorithm, determine described each Multiplicity between note subsequence, including：

Based on similar matrix algorithm, at least one similar matrix between each note subsequence is determined, according to each phase Like matrix, the characteristic value of each similar matrix is determined, according to the characteristic value of each similar matrix, determine described each Multiplicity between note subsequence；Alternatively,

Based on cross correlation algorithm, at least one cross correlation measure between each note subsequence is determined, according to each mutual Guan Du determines the multiplicity between each note subsequence；Alternatively,

Based on editing distance algorithm, at least one editing distance between each note subsequence is determined, according to each volume Distance is collected, determines the multiplicity between each note subsequence；Alternatively,

Based on EMD distance algorithms, determine at least one EMD distances between each note subsequence, according to each EMD away from From determining the multiplicity between each note subsequence.

3. according to the method described in claim 1, it is characterized in that, the benchmark note for obtaining any multimedia file Subsequence, including：

Benchmark of the note subsequence as any multimedia file is randomly choosed from the multiple note subsequence Note subsequence；Alternatively,

It includes the most note subsequence of note number as any more matchmakers that one is selected from the multiple note subsequence The benchmark note subsequence of body file；Alternatively,

It includes the minimum note subsequence of note number as any more matchmakers that one is selected from the multiple note subsequence The benchmark note subsequence of body file.

4. according to any methods of claim 1-3, which is characterized in that the note includes pitch and/or the duration of a sound, described Pitch is the relative pitch between the perfect pitch or two neighboring note of the note.

5. a kind of device obtaining multimedia file, which is characterized in that described device includes：

Extraction module, the reference sequence of notes of the voice signal for extracting acquisition, the reference sequence of notes includes multiple sounds Symbol；

Division module is used for for any multimedia file in multimedia file library, by the sound of any multimedia file Symbol sequence is divided into multiple note subsequences, and each note subsequence includes at least one note, two neighboring note subsequence Between intersection include preset number note, the preset number be more than or equal to 0, and it is whole less than specified numerical value Number, the specified numerical value be any multimedia file include the number of note and the note subsequence of division number it Quotient；

Determining module, for based on default multiplicity algorithm, determining the multiplicity between each note subsequence；

The determining module determines if the multiplicity being additionally operable between each note subsequence is more than default multiplicity The sequence of notes of any multimedia file has repetitive structure；

First acquisition module, for when the sequence of notes of any multimedia file has repetitive structure, obtaining described appoint The benchmark note subsequence of one multimedia file, the benchmark note sub-series of packets include at least one note, and the reference note The number for the note that symbol subsequence includes is less than the number for the note that any multimedia file includes；

The determining module, is additionally operable to according to described with reference to sequence of notes and the sub- sequence of benchmark note of any multimedia file Row, determine the matching degree between the voice signal and any multimedia file；

Second acquisition module, for according to the matching degree between the voice signal and any multimedia file, from described The destination multimedia file that matching degree meets preset condition is obtained in multimedia file library.

6. device according to claim 5, which is characterized in that

The determining module is additionally operable to be based on similar matrix algorithm, determines at least one between each note subsequence Similar matrix determines the characteristic value of each similar matrix according to each similar matrix, according to each similar matrix Characteristic value determines the multiplicity between each note subsequence；Alternatively,

The determining module is additionally operable to be based on cross correlation algorithm, determine between each note subsequence it is at least one mutually The degree of correlation determines the multiplicity between each note subsequence according to each cross correlation measure；Alternatively,

The determining module is additionally operable to be based on editing distance algorithm, determines at least one between each note subsequence Editing distance determines the multiplicity between each note subsequence according to each editing distance；Alternatively,

The determining module is additionally operable to be based on EMD distance algorithms, determines at least one between each note subsequence EMD distances determine the multiplicity between each note subsequence according to each EMD distances.

7. device according to claim 5, which is characterized in that

First acquisition module is additionally operable to randomly choose a note subsequence from the multiple note subsequence as institute State the benchmark note subsequence of any multimedia file；Alternatively,

First acquisition module is additionally operable to select one from the multiple note subsequence to include the most note of note number Benchmark note subsequence of the subsequence as any multimedia file；Alternatively,

First acquisition module is additionally operable to select one from the multiple note subsequence to include the minimum note of note number Benchmark note subsequence of the subsequence as any multimedia file.

8. according to any devices of claim 5-7, which is characterized in that the note includes pitch and/or the duration of a sound, described Pitch is the relative pitch between the perfect pitch or two neighboring note of the note.

9. a kind of device obtaining multimedia file, which is characterized in that described device includes：Processor and memory, it is described to deposit At least one instruction is stored in reservoir, described instruction is loaded by the processor and executed to realize such as claim 1 to power Profit requires the method described in any one of 4.

10. a kind of computer readable storage medium, which is characterized in that be stored at least one in the computer readable storage medium Item instructs, and described instruction is loaded by processor and executed to realize the side as described in any one of claim 1 to claim 4 Method.