CN102521281B

CN102521281B - Humming computer music searching method based on longest matching subsequence algorithm

Info

Publication number: CN102521281B
Application number: CN 201110382159
Authority: CN
Inventors: 王醒策; 陈卓然; 周明全; 武仲科
Original assignee: Beijing Normal University
Current assignee: Beijing Normal University
Priority date: 2011-11-25
Filing date: 2011-11-25
Publication date: 2013-10-23
Anticipated expiration: 2031-11-25
Also published as: CN102521281A

Abstract

The invention discloses a humming computer music searching method based on longest matching subsequence algorithm, which comprises the following steps of (1) fundamental tone frequency extraction, (2) music characteristic database construction, (3) characteristic expression achieving and (4) searching matching. The method has the advantages of improving integral speed of similarity calculation, improving searching efficiency of a searching engine, constructing an accurate music searching platform for karaoke, a network engine based on content search and a multifunctional intelligent mobile terminal platform and being capable of being widely used in fields of relative plugs of the network searching engine and the like. The method for music characteristic extraction, music characteristic expression and similarity accurate calculation can conduct accurate calculation on a humming searching system, enables music search to be accurate, slight and happy, and has strong practical value and reality meaning.

Description

A kind of humming Computer Music search method based on the longest coupling subsequence algorithm

Technical field

The present invention relates to a kind of humming Computer Music search method based on the longest coupling subsequence algorithm, belong to the Computer Applied Technology field based on the musical message content retrieval.

Background technology

In recent years along with the development of Internet, voice data increases by geometric progression.Traditional search method based on label character can not the satisfying magnanimity multi-medium data retrieval need, therefore content-based music information retrieval (Music Information Retrieval, MIR) technology has become one of the hot spot technology in the fields such as signal processing, pattern-recognition and data mining.The research of content-based multimedia information retrieval technology mainly concentrates on image and video aspect, and at present, the technology of domestic and international application on audio retrieval is also rare.Along with the user promotes the interest of network class and retrieval, so that it is most important to set up audio frequency web data retrieval mechanism.The key technical problem that restricts content-based music retrieval technical development is how to extract audio frequency characteristics to realize that music content characterizes and describes musical features and which kind of method to carry out characteristic matching with.The extraction of melody characteristics and expression are based on the basic link in the music retrieval of content, the semantic information that can express objective, accurately music of the melody characteristics that from snatch of music, extracts, determining the correct transmission of musical features, be directly connected to follow-up coupling and whether effectively retrieve; Whether accurately can the similarity computational algorithm of snatch of music and corresponding matching mechanisms meet the general sense of hearing, psychological feelings, be to determine result for retrieval key factor.Therefore the calculating of the extraction expression of melody characteristics and similarity assessment is the most important link that affects the music retrieval system performance of a singing search or content.

For acoustic signal, its pitch acoustically is to be determined by its fundamental frequency sequence (Fundamental Frequency).The order that pitch extracts is that the acoustic signal of user's input is changed into the fundamental frequency sequence.At present, common algorithm aspect feature extraction is such as autocorrelation function algorithm (Autocorrelation), cepstral analysis method (Cepstral Analysis), cross correlation function algorithm (CCF), average magnitude difference function algorithm (AMDF), standardization cross correlation function algorithm (NCCF), integrate pitch extraction algorithm (Integrated Pitch Tracker), but along with the development of correlation technique in a lot of application scenarioss, the treatment effect of these algorithms has not reached the requirement of application, very easily causes the deviation of feature representation and real music semantic content with fuzzy.

Common methods and shortcoming aspect feature representation is as follows at present:

1, the pitch contour representation can't quantize change in pitch, easily causes feature representation and real music semantic content fuzzy, along with the song sample expansion, identical but the situation that actual melody differs greatly of pitch contour very easily occurs.

2, MIDI note approximate expression method can produce the round values of the approximate normalizing of the natural pitch of user's humming to discrete MIDI note melody and express inaccurate problem.As shown in Figure 1, displaying be the expression of same section melody in c major and the large accent of A, the MIDI pitch value of two sections all diaphones of melody fragment is fully different, but impression is almost completely consistent on the sense of hearing of giving the people and the music cognition.In rational feature representation method, should look these two sections melody and have identical melody characteristics; Just embody MIDI note approximate expression method based on this point and seem appropriate not and comprehensive.

Although 3, the perfect pitch representation has solved the problem of the expression mistake of approximation generation, but the vertical overall offset of pitch (Pitch Shiftiness) that produces when cooperating the related algorithm of string comparing class and some dynamic programmings can be brought serious matching error, so this feature representation method and be not suitable for general similarity computing mechanism.

Although 4, transfer interior sound level representation to avoid the pitch overall offset to hum the impact that brings with different modes.But the method need to add mode keynote and mode attribute as additional information, and in the use scenes that humming is used, the attribute of keynote and mode can't directly obtain in most cases, in the situation short in the humming fragment, that inclusion information is abundant not, very large deviation may appear in the method.As shown in Figure 2, this is the melody fragment of one section c major, but also meets the mode attribute of the large accent of G simultaneously.This is because only there is a variation sound #F in the syllable of the large accent of G with respect to c major.So when this variation sound not occurring in the melody fragment or its reduction during sound, the melody fragment that is comprised of other notes meets the attribute of c major and two modes of the large accent of G.This can cause utilizing interior each note of mode to the inefficacy of the number of degrees (Degrees of the Scale from the Tonic) the accordatura method of keynote.And many music styles frequent employing in the process of creation comprises modulation, transfers outer sound etc. to break the musical composition skill of single mode attribute, and in these cases, employing the method is carried out singing search and can be produced very large error.

5 in traditional triple melody representations, and this attribute of interval is expressed is frequency change amplitude between the adjacent note, take hertz as unit.Employed pitch unit is semitone in the musical system, although semitone becomes positive correlation with hertz but is not to be linear dependence, semitone presents logarithmic relationship with hertz, and therefore in different pitch regions, the difference that differs frequency corresponding between two sounds of semitone of equal unit is different.If the difference of proportion is as interval criterion between adjacent two sounds, this will cause same melody to produce different interval sequences in different pitch regions, and then the distortion of serious musical features appears, for example shown in Figure 4: melody 1 comprises identical melody characteristics with melody 2, but under different modes, hum, its each naturetone is distinct in the difference that frequency dimension distributes, so that triple melody representation can't objective expression melody characteristics.

Existing method aspect similarity calculating is as follows at present:

1, editing distance algorithm

Traditional editing distance algorithm, editing distance are to calculate between two character strings, a character string A are transformed into the minimum operation cost of another character string B.Simple editing distance algorithm (Levenshtein Distance) is only applicable to the calculating between the character string, can't directly use with the similarity that consists of music rhythm and calculate.And the editing distance algorithm after the expansion can be used for the distance calculating of real number string, and the advantage of this method is the conversion cost between two sequence of real numbers that can quantize relatively mutually to mate, with the similarity between two sections melody weighing two sequence of real numbers representatives.But this editing distance algorithm that is extended to the real number scope more is applicable to the overall situation relatively, and when not mating between two melody sequences as input, its similarity calculated performance obviously reduces.For example: when the complete information of certain phrase fragment of user humming and a piece of music mates, the editing distance algorithm can calculate a large amount of insertion elements or delete the extra cost that element brings, this meeting so that the melody similarity greatly reduce, thereby cause algorithm to lose efficacy.Shown in the part of Fig. 5 dotted line delineation, although the part of melody B and melody A has very high similarity, can be considered the melody fragment of coupling.But because melody B is carried out similarity calculating by machinery with integral body melody A, its similarity is greatly diminished, and this also is the defective place of editing distance algorithm.

2, longest common subsequence algorithm

Effect and the advantage of longest common subsequence algorithm are that this algorithm can find the subsequence of mutual coupling from two character string A, B, thereby can be used for realizing obtaining from two sections melody the fragment of coupling.But because the longest common subsequence algorithm does not consider that element inserts and the cost of deletion, therefore, when the complete melodic information of certain phrase fragment of user humming and a piece of music mates, the short melody sequence of user input can be by unconfined stretching, and the melody of two sections wide of the marks is by one period mating of stretching by force wherein.This matching way has greatly twisted the feature of music rhythm, even if two periods melody stretched couplings, but this method in fact lost efficacy.

3, dynamic time warping algorithm

The voice signal of user's humming has very strong randomness, different pronunciation customs, the phenomenon that residing environment difference all can cause pronouncing duration length to differ during pronunciation.The dynamic time warping algorithm is that voice signal is elongated or shortened, until during consistent with the length of mode standard, the time shaft of unknown words can produce distortion or bending, so that its characteristic quantity is corresponding with mode standard.This algorithm characteristics can be stretched at time shaft sequence, thereby makes similar profile to mutual alignment, therefore be widely used in the fields such as content-based music retrieval, signal processing, speech recognition.But this algorithm has some shortcomings equally, at first is that time complexity is too high, mates and whole sentence note when being more or less the same at the whole sentence to random length, causes easily the not high problem of matching result discrimination.

4, hidden Markov model

Hidden Markov model (Hidden Markov Model, HMM) is a kind of Statistic analysis models, can be used in the speech recognition of unspecified person.In the singing search field, because the humming melody of user's input itself also is voice signal, can be used as the observation vector of Hidden Markov Model (HMM), and the pitch parameters sequence signature in the tone character data storehouse has probabilistic statistical characteristics, can be used as the hidden state of model.In realization, carry out modeling by the melody characteristics to different songs and consist of search space, and model is trained accordingly; In retrieving, can feedback user the voice signal of humming and the probability that the song model in the search space mates mutually.Singing search system based on hidden Markov model is realized can return the good result of precision ratio per family for the usefulness of difference performance level.But it also has inevitable shortcoming simultaneously: hidden Markov model is for every record in the musical features database, need to set up respectively corresponding training pattern, along with the feature database capacity increase, the workload of training will be very huge, so the hidden Markov model practicality is relatively poor.

Summary of the invention

The object of the present invention is to provide a kind of humming Computer Music search method based on the longest coupling subsequence algorithm that computing machine initiative recognition music tone is changed that can overcome above-mentioned technical matters.Basic fundamental thinking of the present invention is: on the basis of analyzing present music features extraction and expression, determine the characteristic sequence that consists of with the semitone interval between the adjacent tone; Adopt the RAPT algorithm to realize the extraction of music fundamental frequency; Avoided the feature extraction deviation that causes at different modes humming at technique effect, for prerequisite and basis have been created in the accurate extraction of melody characteristics.Aspect the melody characteristics expression, basic as giving birth to rule with twelve-tone equal temperament, with pitch contour sequence process log-transformation, be converted into the interval sequence take semitone as unit, avoided different user when mode is hummed on the impact of melody characteristics, realize simultaneously normalization that the MIDI problem characteristic is extracted realizing macroscopical melody contours modeling with the MOMEL algorithm, and to transform feature extraction and the expression that realizes as technology based on the logarithm of twelve-tone equal temperament, make originally length 10 ³Order of magnitude fundamental frequency sequence, under the prerequisite of not losing melody characteristics, got rid of the lyrics, intonation to the influence of fluctuations of macroscopical melody fundamental frequency signal, and to make the length reduction that obtains the pitch contour sequence be 10 orders of magnitude, for the matching speed that further improves total system provides important support.Aspect similarity calculating, employing is based on the longest coupling subsequence (Longest Matched Subsequence, LMS) similarity computing mechanism and the method that tradition string coupling computing method combine have been avoided the limitation of other related algorithm in application effectively.

Key step of the present invention is:

(1) fundamental frequency extracts; Process by audio frequency, adopt the RAPT algorithm carry out fundamental frequency extract, adopt low-pass filter and Hi-pass filter carry out the fundamental frequency sequence regular, adopt medium filtering and linear smoothing to carry out fundamental frequency sequence step level and smooth, that adopt the MOMEL algorithm to carry out the melody modeling to realize the voice signal of user's humming is converted into fundamental frequency profile sequence.

(2) structure of musical features database; The MIDI file of all songs in the database is carried out pre-service, extract MIDI pitch sequence wherein, and deposit the musical features database in independent field, in follow-up retrieval link, save the step of MIDI file processing, but directly from property data base, extracted pitch sequence.

(3) feature representation is realized; The MIDI pitch sequence that the fundamental frequency profile sequence that will obtain from audio processing modules and musical features database extract is converted into unified melodic interval sequence, respectively the melody characteristics of representative of consumer humming and data-base recording.

(4) retrieval coupling; The melody characteristics sequence that to hum audio extraction from the user respectively with search space all musical features sequences carry out similarity and calculate, and according to the longest coupling subsequence (LMS) algorithm mechanism, the result of at every turn mating is carried out sequencing of similarity.

Advantage of the present invention is, promoted the overall rate that similarity is calculated, and improved the search efficiency of search engine, for Karaoke and content_based retrieval network engine and multifunctional intellectual mobile-terminal platform have made up accurate music retrieval platform; Can be widely used in the fields such as relevant plug-in unit of network search engines, the Method for Accurate Calculation of the extraction of musical features provided by the present invention, the expression of musical features and similarity can provide the accurate calculating of singing search system, make the retrieval of music accurate, light, happy, have stronger practical value and realistic meaning.

Description of drawings

Fig. 1 is the respectively expression synoptic diagram in c major and the large accent of G of same section melody;

Fig. 2 is the one section melody synoptic diagram that meets simultaneously c major and the large key formula of G attribute;

Fig. 3 is the numerical relation synoptic diagram of semitone and hertz;

Fig. 4 is the frequency variation curve synoptic diagram of identical melody under different modes;

Fig. 5 is that local melody and whole melody mate synoptic diagram;

Fig. 6 is audio feature extraction overall procedure synoptic diagram of the present invention;

Fig. 7 is that identical melody is at the interval curve of different pitch regions

Fig. 8 is the melody modeling synoptic diagram based on the MOMEL algorithm of the present invention;

Fig. 9 is the similarity calculation flow chart based on the LMS algorithm of the present invention.

Embodiment

Describe the present invention below in conjunction with drawings and Examples.Key step of the present invention is:

(1) fundamental frequency sequential extraction procedures and processing

In the technology based on musical message content retrieval, the accuracy of the feature extraction of audio frequency input is played vital effect for the overall performance of music information retrieval system.Desirable audio feature extraction needs to express the music rhythm in the audio retrieval information that the user inputs objective and accurately, for promoting retrieval rate and recall precision, the melody characteristics that the present invention proposes a kind of combination that comprises the multi-steps such as fundamental frequency extraction, frequency domain filtering, medium filtering, melody modeling extracts flow process, fundamental frequency sequential extraction procedures of the present invention and process overall procedure as shown in Figure 6:

1) the WAV wave file of input is used the RAPT algorithm and carry out the fundamental frequency extraction, thereby obtain the fundamental frequency sequence;

2) original fundamental frequency sequence will be processed through Hi-pass filter and low-pass filter, remove burr and noise spot, level and smooth fundamental curve.Human range width range is generally between E2 (82Hz)～C6 (1047Hz), according to mankind's nature range of voice, be 80Hz with the threshold value setting of high-pass filtering, the threshold value setting of low-pass filtering is 1100Hz, in order to remove the fundamental frequency value that is in outside the height threshold value;

3) with the linear smoothing processing fundamental frequency sequence is carried out linear filtering and process, the noise spot in the removal fundamental frequency sequence and further level and smooth to the curved profile of fundamental frequency sequence.In an embodiment of the present invention, filter window is set to 50 milliseconds.

4) with resulting fundamental frequency sequence, remove noise spot by medium filtering, effectively remove the noise spot in the fundamental frequency sequence, and kept in good condition the variation of the step between the continuous curve in the fundamental frequency sequence.In an embodiment of the present invention, after the process fundamental frequency extracts was 100 point/seconds to fundamental frequency sequential sampling rate, and the medium filtering window is set to 77 milliseconds.

(2) musical features is expressed

1) feature representation of fundamental frequency curve

With semitone as unit, with the sequence that interval was consisted of between adjacent two sounds as melody characteristics.The melody fragment that comprises n natural note can be expressed as the interval sequence that n-1 real number consists of, and expresses melody characteristics in the mode that quantizes, and the musical features of different melody has discrimination, and calculating for follow-up similarity provides effective result; Insensitive to whole pitch overall offset, allow the user humming in the mode arbitrarily, identical melody characteristics still can be extracted; Have good stability, even in the limited situation of melodic information, the feature representation method advantage such as still can not lose efficacy.To the audio-frequency information of user by the humming input, interval is calculated definition shown in formula (1):

Pitch {Interval}_{n} = 12 * \log_{2} (\frac{{freq}_{n + 1}}{{freq}_{n}}) - - - (1)

According to above definition, can be with pitch frequencies sequence Fx=(freq ₁, freq ₂, freq ₃..., freq _n) be mapped to interval sequence Pi=(pitch_interval ₁, pitch_interval ₂, pitch_interval ₃..., pitch_interval _N-1).

For the MIDI file of storing in the musical features database, need to adopt same melody characteristics expression way, so that have identical form from user input with the melody characteristics that extracts from database side.To the MIDI file, interval is calculated definition shown in formula (2), wherein MIDI_note _N+1And MIDI_note _nRepresent the pitch value in the MIDI file:

Pitch Interval _n＝MIDI_note _n+1-MIDI_note _n (2)

Through above conversion, same melody characteristics under the different modes can be carried out normalization, eliminated simultaneously the impact that different humming modes extract melody characteristics, as shown in Figure 7, the corresponding point of obvious two curves overlap fully, the feature of same melody extracting in normalized mode by success in the different modes.Can design a similarity evaluation mechanism based on identical feature representation mode, finish the coupling of the characteristic information in retrieving information and the database, treated fundamental frequency sequence is carried out the melody modeling, obtain one group of melody skeleton that is consisted of by discrete point; The melody skeleton transforms through logarithm, interval is extracted between the adjacent tone of input, and with this characteristic sequence as the input audio frequency, the melody characteristics that finally extracts, the information that is admitted in matching module and the musical features database is carried out similarity calculating, obtains matching result.

2) feature representation of melody

By fundamental frequency extract, filtering obtain the fundamental frequency contour curve, can be split into is the combination of two kinds of separate melody compositions: macroscopical melody composition and microcosmic melody composition.Its definition is as follows respectively:

Macroscopic view melody composition: the tone recognition in the reaction voice messaging, closely related with the overall change in pitch of fundamental frequency.

Microcosmic melody composition: react the phoneme composition in the voice messaging, affect the localized variation of fundamental frequency curve.

In like manner, humming information is a kind of as voice messaging, also can be considered the combination of two kinds of melody compositions.Music rhythm for the voice humming, change in pitch is only relevant with macroscopical melody composition of its fundamental curve, and the phoneme informations such as the phonetic symbol of humming, the lyrics, then determined by its fundamental frequency curve microcosmic melody composition, utilize Quadric Sample-Strip Functions, obtain macroscopical melody of fundamental frequency curve by Interpolation.Resulting macroscopical melody presents with the form of dispersive target point sequence, and has represented pitch melody characteristics corresponding to this fundamental frequency sequence, and the humming melody characteristics has nothing to do with phonetic symbol, phoneme information based on pitch sequence.So, utilize the MOMEL algorithm to processing through the fundamental frequency contour curve of filtering, can obtain the macroscopical melody sequence in the fundamental frequency contour curve, and the basis of expressing as follow-up melody characteristics.

As shown in Figure 8 be an example of the processing of MOMEL algorithm.Through the melody modeling, macroscopical melody (below) of fundamental frequency contour curve (top) is extracted by success.Yet the direct result of MOMEL algorithm output is expressed for follow-up melody characteristics and is still had obviously deficiency.For example, in last two sections of the fundamental frequency contour curve, the fundamental frequency contour curve that represents a pitch is labeled out two impact points that numerical value is very close among Fig. 8.For solving problems, the present invention arranges a parameterized threshold value, in order to control the interval between the adjacent tone.When the interval between two sounds was lower than this threshold value, this interval can deleted or adjacent with other interval merge based on concrete condition.

(3) matching algorithm is calculated-retrieved to similarity

(a) the longest coupling subsequence algorithm (Longest Matched Subsequence, LMS)

The melody characteristics that obtains based on the feature extracting method among the present invention is one group of sequence of real numbers, and the melody characteristics of storing in the musical features database is integer sequence.At this moment, if the longest common subsequence algorithm that utilizes of machinery calculates the similarity of two sequences, so a lot of elements that originally can mate may be missed.

The longest coupling subsequence algorithm just can solve longest common subsequence algorithm (LCS) and have problems in application.The longest coupling subsequence algorithm is as a kind of improvement to the longest common subsequence algorithm, and its Output rusults is two sub-sequence A independently ', B ', be respectively the subsequence of list entries A, B.

Define in the following manner the longest coupling subsequence:

Given list entries A=(a1, a2, a3 ..., an) and B=(b1, b2, b3 ..., bm),

Namely produce subsequence A '=(a ' 1, a ' 2, a ' 3 ..., a ' 1) and B '=(b ' 1, b ' 2, b ' 3 ..., b ' 1).

Subsequence A ', B ' satisfy following condition:

1) each element among subsequence A ', the B ' has the element that matches in another subsequence, and meets following condition:

In subsequence A ', B ':

The element ai of element a ' the k corresponding A of A ';

The element bj of the corresponding B of the element b ' k of B '.

Satisfy: LD (ai, bj)≤δ, wherein δ is given local similarity maximal value.

2) subsequence A ' is relative continuous in original series separately respectively with B ', namely meets following condition:

In subsequence A ', B ':

The element ai of element a ' the k corresponding A of A ', A ' the element as of element a ' k+1 corresponding A;

The element bj of the corresponding B of the element b ' l of B ', B ' the element at of the corresponding B of element b ' l+1;

Satisfy: s-i≤L and t-j≤L, wherein L is the maximal value that allows to insert element in the subsequence.

3) subsequence A ' has identical length with B '.And A ', B ' be respectively A, B all satisfy condition 1) and 2) subsequence in the longest, namely | A ' |=| B ' |=max{|Ak|, | Bl|}.

In the longest coupling subsequence algorithm, the concept that element equates is replaced by the concept of coupling.Different from the longest common subsequence algorithm, A ' is not mechanically to equate fully with B ', but the longest and have the highest one group of similarity in all subsequences of A, B.

(b) local similarity calculates

As the basis of the longest coupling subsequence algorithm, the below is introduced with regard to the account form of local similarity.

At first, the editing distance algorithm of definition sequence of real numbers:

Sequence of real numbers to given input: X=(x1, x2, x3 ..., xm), Y=(y1, y2, y3 ..., yn).

According to the weights that mutually transform between the element in the formula 3 definition sequence of real numbers, wherein δ is for judging the threshold value that equates:

w (a, b) = \{\begin{matrix} 0, if | a - b | < δ \\ | a - b |, if | a - b | &GreaterEqual; δ \end{matrix} - - - (3)

Initialization editing distance matrix D m, n, initialization condition is as follows:

d0，0＝0；

Di, 0=di-1,0+w (xi, 0), wherein 1≤i≤m;

D0, j=d0, j-1+w (0, yj), 1≤j≤n wherein.

To the matrix unit of 1≤i≤m and 1≤j≤n, calculate editing distance matrix D m, n, recursion equation as shown in Equation 4, wherein Wdel, Wsub and Wins are respectively the weights of deletion, replacement, three kinds of operations of insertion:

d_{i, j} = \min \{\begin{matrix} d_{i - 1, j} + W_{del} * w (x_{i}, 0) \\ d_{i - 1, j - 1} + W_{sub} * w (x_{i}, y_{j}) \\ d_{i, j - 1} + W_{ins} * w (0, y_{j}) \end{matrix} - - - (4)

Finally, sequence of real numbers X and the editing distance ED (X, Y) between the Y of input can be from matrix D m, the lower right corner dm of n, and n obtains, as shown in Equation 5:

ED(X，Y)＝dm，n。(5)

Then, provide the specific definition of the local similarity of element:

Melody characteristics sequence to given input: A=(a1, a2, a3 ..., an), B=(b1, b2, b3 ..., bm).

Respectively get an element and consist of two tuples (ai, bj) from A, B, for every a pair of binary group, its local similarity is defined as follows, and wherein k is local radius:

Define local subsequence X=(ai-k ..., ai ..., ai+k) and Y=(bj-k ..., bj ..., bj+k).

Then near the local similarity LD (ai, bj) two tuples (ai, bj) can be obtained by the editing distance ED (X, Y) between local subsequence X and the Y, as shown in Equation 6:

LD(ai，bj)＝ED(X，Y)。(6)

(c) the longest coupling subsequence is calculated in dynamic programming

Near the clear and definite computation rule of local similarity element ai, the bj among original series A, the B can utilize the policy calculation of dynamic programming (Dynamic Programming) to go out to meet the longest coupling subsequence A ', the B ' of definition.

At first, utilize the calculation process of dynamic programming clearing longest common subsequence algorithms (LCS):

Given list entries A=(a1, a2, a3 ..., am) and B=(b1, b2, b3 ..., bn).

Structure LCS Matrix C m, n, according to following this matrix of condition initialization:

Ci, 0=0, c0, j=0, wherein 0≤i≤m and 0≤j≤n;

Utilize the recursion equation compute matrix in the formula 7, wherein 1≤i≤m and 1≤j≤n:

c_{i, j} = \{\begin{matrix} c_{i - 1, j - 1} + 1, if a_{i} = b_{j} \\ \max (c_{i, j - 1}, c_{i - 1, j}), else \end{matrix} - - - (7)

Finally, longest common subsequence can be by Cm, the lower right corner cm of n, and n draws.

The longest coupling subsequence algorithm (LMS) can calculate as follows:

Given list entries A=(a1, a2, a3 ..., am) and B=(b1, b2, b3 ..., bn).

Definition Matrix C m, n, Rm, n and Sm, n, in order to calculate the longest coupling subsequence, specific definition is as follows respectively:

Shaping Matrix C m, n, its unit ci, j store subsequence (a1, a2, a3 ..., ai) and (b1, b2, b3 ..., the longest coupling sub-sequence length between bj);

INTEGER MATRICES Rm, n, its unit ri, j store the number of discontinuous element in these two subsequences;

Character matrix Sm, n, its unit si, j stores Matrix C m, n, Rm, the inside calculation path of n, the direction that increases calculating each time record the longest normal coupling subsequence.

According to these three matrixes of following condition initialization:

Ci, 0=0, c0, j=0, r0, j=0, ri, 0=0, s0, j=' _ ', si, 0=' _ ', wherein 0≤i≤m and 0≤j≤n;

Utilize the recursion equation compute matrix Cm among the formula 8-10, n, Rm, n and Sm, n, wherein 1≤i≤m and 1≤j≤n:

c_{i, j} = \{\begin{matrix} c_{i - 1, j - 1} + 1, ifLD (a_{i}, b_{j}) \leq δand r_{i - 1, j - 1} \leq L \\ 1, ifLD (a_{i}, b_{j}) \leq δand r_{i - 1, j - 1} > L \\ \max (c_{i, j - 1}, c_{i - 1, j}), ifLD (a_{i}, b_{j}) > δand r_{i, j - 1}, r_{i - 1, j} \leq L \\ c_{i, j - 1} + 1, ifLD (a_{i}, b_{j}) > δand r_{i, j - 1} \leq L < r_{i - 1, j} \\ c_{i - 1, j} + 1, ifLD (a_{i}, b_{j}) > δand r_{i - 1, j} \leq L < r_{i, j - 1} \\ 0, ifLD (a_{i}, b_{j}) > δand r_{i - 1, j}, r_{i, j - 1} > L \end{matrix} - - - (8)

r_{i, j} = \{\begin{matrix} 0, ifLD (a_{i}, b_{j}) \leq δ \\ r_{i, j - 1} + 1, ifLD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} > c_{i - 1, j} \\ r_{i - 1, j} + 1, ifLD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} < c_{i - 1, j} \\ \min (r_{i, j - 1,} r_{i - 1, j}) + 1, ifLD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} = c_{i - 1, j} \\ r_{i, j - 1} + 1, ifLD (a_{i}, b_{j}) > δand r_{i, j - 1} \leq L < r_{i - 1, j} \\ r_{i - 1, j} + 1, ifLD (a_{i}, b_{j}) > δand r_{i - 1, j} \leq L < r_{i, j - 1} \\ 0, ifLD (a_{i}, b_{j}) > δand r_{i - 1, j}, r_{i, j - 1} > L \end{matrix} - - - (9)

s_{i, j} = \{\begin{matrix} ' S', ifLD (a_{i}, b_{j}) \leq δand c_{i, j} = 1 \\ ' O', ifLD (a_{i}, b_{j}) \leq δand c_{i, j} &NotEqual; 1 \\ ' R', ifLD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} > c_{i - 1, j} \\ ' D', ifLD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} < c_{i - 1, j} \\ ' R', ifLD (a_{i}, b_{j}) > δ, r_{i, j - 1} \leq r_{i - 1, j} \leq Land c_{i, j - 1} = c_{i - 1, j} \\ ' D', ifLD (a_{i}, b_{j}) > δ, r_{i - 1, j} \leq r_{i, j - 1} \leq Land c_{i, j - 1} = c_{i - 1, j} \\ ' R', ifLD (a_{i}, b_{j}) > δand r_{i, j - 1} \leq L < r_{i - 1, j} \\ ' D', ifLD (a_{i}, b_{j}) > δand r_{i - 1, j} \leq L < r_{i, j - 1} \end{matrix} - - - (10)

Finally, the Output rusults of the longest coupling subsequence algorithm of the present invention can pass through sign matrix Sm, the path of the record of n and Cm, and the numerical evaluation that stores among the n obtains.The length of the longest coupling subsequence then can be by cm, and n directly obtains.

The idiographic flow of the longest coupling subsequence algorithm of melody similarity of the present invention is as shown in Figure 9:

1): given input melody characteristics sequence A, B, adopt the calculating of LMS algorithm to obtain subsequence A ', the B ' that similarity is the highest each other between A, the B, i.e. the longest coupling subsequence;

2) length gauge of the length by primitive character sequence A, B and coupling subsequence A ', B ' is calculated the shared ratio of compatible portion between primitive character sequence A, the B;

3) adopt the editing distance algorithm of real number field to calculate editing distance between A ', the B ';

4): the characteristic sequence that each group in the retrieving is participated in coupling, carry out descending sort with the shared ratio of the compatible portion between A, the B as the first key word, editing distance carries out the ascending order arrangement as the second key word between A ', the B ', its similarity is sorted, consist of the tabulation of similarity descending.

The present invention has selected six tested objects at random in the test compliance test result of embodiment, these six objects provide information to be retrieved take the humming form as system.In addition, for guaranteeing the each time validity of humming, avoid experimental result to be subject to the impact of tested object subjective factor, in the experimentation, the humming of each person-time only has other five tested object approvals more than half, think when song that this humming person hums is target song really, just be can be regarded as once effectively humming, otherwise will not be designated as once effectively experimental data.Article 87, effective audio retrieval information produces in experiment, and for the retrieving information of major part, desired song is all hit acceptable Search Results cis-position.The first place of Search Results has been hit in retrieval to 58.62%, target song; Percentage ranks the first in the result of all cis-positions.To surpassing 88.51% retrieval, target song can both be hit the first five position of Search Results in addition; And 95.40% retrieval, target song can be hit top ten.To each time independent retrieval, its execution time arrives 550ms at 150ms, average 289.47ms, the hardware environment of the experiment of considering, for a musical features database that has songs up to a hundred, obtained desirable accuracy rate effect its working time also within the acceptable range, and overall experimental result shows that the computing method of feature extraction proposed by the invention, expression and melody similarity are effective.

The above; be the specific embodiment of the present invention only, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in scope disclosed by the invention; the variation that can expect easily or replacement all should be encompassed in the protection domain of claim of the present invention.

Claims

1. the humming Computer Music search method based on the longest coupling subsequence algorithm is characterized in that, may further comprise the steps:

(1) fundamental frequency sequential extraction procedures and processing;

2) original fundamental frequency sequence is processed through Hi-pass filter and low-pass filter, remove burr and noise spot, level and smooth fundamental curve, human range width range is generally at E2(82Hz)～C6(1047Hz) between, according to mankind's nature range of voice, be 80Hz with the threshold value setting of high-pass filtering, the threshold value setting of low-pass filtering is 1100Hz, in order to remove the fundamental frequency value that is in outside the height threshold value;

3) with the linear smoothing processing fundamental frequency sequence is carried out linear filtering and process, the noise spot in the removal fundamental frequency sequence and further level and smooth to the curved profile of fundamental frequency sequence;

4) with resulting fundamental frequency sequence, remove noise spot by medium filtering, effectively remove the noise spot in the fundamental frequency sequence, and kept in good condition the variation of the step between the continuous curve in the fundamental frequency sequence;

(2) musical features is expressed;

1) feature representation of fundamental frequency curve;

With semitone as unit, with the sequence that interval was consisted of between adjacent two sounds as melody characteristics, the melody fragment that comprises n natural note, be expressed as the interval sequence that n-1 real number consists of, express melody characteristics in the mode that quantizes, the musical features of different melody has discrimination, and calculating for follow-up similarity provides effective result; Insensitive to whole pitch overall offset, allow the user humming in the mode arbitrarily, identical melody characteristics still can be extracted; Have good stability, to the audio-frequency information of user by the humming input, interval is calculated definition as shown in Equation (1):

Pitch {Interval}_{n} = 12 * \log_{2} (\frac{{freq}_{n + 1}}{{freq}_{n}}) - - - (1)

According to above definition, with pitch frequencies sequence Fx=(freq ₁, freq ₂, freq ₃..., freq _n) be mapped to interval sequence Pi=(pitch_interval ₁, pitch_interval ₂, pitch_interval ₃..., pitch_interval _N-1);

For the MIDI file of storing in the musical features database, need to adopt same melody characteristics expression way, so that have identical form from user input with the melody characteristics that extracts from database side, to the MIDI file, interval is calculated definition as shown in Equation (2), wherein MIDI_note _N+1And MIDI_note _nRepresent the pitch value in the MIDI file:

Pitch Interval _n＝MIDI_note _n+1-MIDI_note _n (2)

Through above conversion, same melody characteristics under the different modes is carried out normalization, eliminated simultaneously the impact that different humming modes extract melody characteristics, feature the extracting in normalized mode by success of same melody in the different modes, design a similarity evaluation mechanism based on identical feature representation mode, finish the coupling of the characteristic information in retrieving information and the database, treated fundamental frequency sequence is carried out the melody modeling, obtain one group of melody skeleton that is consisted of by discrete point; The melody skeleton transforms through logarithm, interval is extracted between the adjacent tone of input, and with this characteristic sequence as the input audio frequency, the melody characteristics that finally extracts, the information that is admitted in matching module and the musical features database is carried out similarity calculating, obtains matching result;

2) feature representation of melody;

By fundamental frequency extract, filtering obtain the fundamental frequency contour curve, being split into is the combination of two kinds of separate melody compositions: macroscopical melody composition and microcosmic melody composition; Its definition is as follows respectively:

Macroscopic view melody composition: the tone recognition in the reaction voice messaging, closely related with the overall change in pitch of fundamental frequency;

Microcosmic melody composition: react the phoneme composition in the voice messaging, affect the localized variation of fundamental frequency curve;

In like manner, humming information is a kind of as voice messaging, also be considered as the combination of two kinds of melody compositions, music rhythm for the voice humming, change in pitch is only relevant with macroscopical melody composition of its fundamental curve, and the phonetic symbol of humming, the phoneme informations such as the lyrics, then determined by its fundamental frequency curve microcosmic melody composition, utilize Quadric Sample-Strip Functions, obtain macroscopical melody of fundamental frequency curve by Interpolation, resulting macroscopical melody presents with the form of dispersive target point sequence, and has represented pitch melody characteristics corresponding to this fundamental frequency sequence, the humming melody characteristics based on pitch sequence and and phonetic symbol, phoneme information is irrelevant, so, utilize the MOMEL algorithm to processing through the fundamental frequency contour curve of filtering, obtain the macroscopical melody sequence in the fundamental frequency contour curve, and the basis of expressing as follow-up melody characteristics;

The direct result of MOMEL algorithm output is expressed for follow-up melody characteristics and is still had obviously deficiency, therefore a parameterized threshold value is set again, in order to control the interval between the adjacent tone, when the interval between two sounds was lower than this threshold value, this interval can deleted or adjacent with other interval merge based on concrete condition;

(3) matching algorithm is calculated-retrieved to similarity

(a) the longest coupling subsequence algorithm (Longest Matched Subsequence, LMS);

The melody characteristics that obtains based on feature extracting method is one group of sequence of real numbers, and the melody characteristics of storing in the musical features database is integer sequence, at this moment, if the longest common subsequence algorithm that utilizes of machinery calculates the similarity of two sequences, so a lot of elements that originally can mate just are missed;

The longest coupling subsequence algorithm can solve longest common subsequence algorithm (LCS) and have problems in application, the longest coupling subsequence algorithm is as a kind of improvement to the longest common subsequence algorithm, its Output rusults is two sub-sequence A independently ', B ', be respectively the subsequence of list entries A, B;

Define in the following manner the longest coupling subsequence:

Given list entries A=(a1, a2, a3 ..., an) and B=(b1, b2, b3 ..., bm),

Namely produce subsequence A '=(a ' 1, a ' 2, a ' 3 ..., a ' l) and B '=(b ' 1, b ' 2, and b ' 3 ..., b ' l);

Subsequence A ', B ' satisfy following condition:

One, each element among subsequence A ', the B ' has the element that matches in another subsequence, and meets following condition:

In subsequence A ', B ':

The element ai of element a ' the k corresponding A of A ';

The element bj of the corresponding B of the element b ' k of B ';

Satisfy: LD (ai, bj)≤δ, wherein δ is given local similarity maximal value;

Its two, subsequence A ' is relative continuous in original series separately respectively with B ', namely meets following condition:

In subsequence A ', B ':

The element ai of element a ' the k corresponding A of A ', the element as of element a ' the k+1 corresponding A of A ';

Satisfy: s – i≤L and t – j≤L, wherein L is the maximal value that allows to insert element in the subsequence;

Its three, subsequence A ' has identical length with B ', and A ', B ' be respectively A, B all satisfy condition 1) and 2) subsequence in the longest, namely | A ' |=| B ' |=max{|Ak|, | Bl|};

In the longest coupling subsequence algorithm, the concept that element equates is replaced by the concept of coupling, and different from the longest common subsequence algorithm, A ' is not mechanically to equate fully with B ', but the longest and have the highest one group of similarity in all subsequences of A, B;

(b) local similarity calculates;

As the basis of the longest coupling subsequence algorithm, the below is the account form of local similarity;

Sequence of real numbers to given input: X=(x1, x2, x3 ..., xm), Y=(y1, y2, y3 ..., yn);

According to the weights that mutually transform between the element in formula (3) the definition sequence of real numbers, wherein δ is for judging the threshold value that equates:

w (a, b) = \{\begin{matrix} 0, if | a - b | < δ \\ | a - b |, if | a - b | &GreaterEqual; δ \end{matrix} - - - (3)

d0,0=0;

Di, 0=di-1,0+w (xi, 0), wherein 1≤i≤m;

D0, j=d0, j-1+w (0, yj), 1≤j≤n wherein;

To the matrix unit of 1≤i≤m and 1≤j≤n, calculate editing distance matrix D m, n, recursion equation are shown in formula (4), and wherein Wdel, Wsub and Wins are respectively deletion, replacement, insert the weights of three kinds of operations:

d_{i, j} = \min \{\begin{matrix} d_{i - 1, j} + W_{del} * w (x_{i}, 0) \\ d_{i - 1, j - 1} + W_{sub} * w (x_{i}, y_{j}) \\ d_{i, j - 1} + W_{ins} * w (0, y_{j}) \end{matrix} - - - (4)

Finally, the sequence of real numbers X of input and the editing distance ED (X, Y) between the Y be from matrix D m, the lower right corner dm of n, and n obtains, shown in formula (5):

ED(X,Y)=dm,n， (5)

Then, provide the specific definition of the local similarity of element:

Melody characteristics sequence to given input: A=(a1, a2, a3 ..., an), B=(b1, b2, b3 ..., bm);

Define local subsequence X=(ai-k ..., ai ..., ai+k) and Y=(bj-k ..., bj ..., bj+k);

Then near the local similarity LD (ai, bj) two tuples (ai, bj) is obtained by the editing distance ED (X, Y) between local subsequence X and the Y, shown in formula (6):

LD(ai,bj)=ED(X,Y)； (6）

(c) the longest coupling subsequence is calculated in dynamic programming;

Near the clear and definite computation rule of local similarity element ai, the bj among original series A, the B utilizes the policy calculation of dynamic programming (Dynamic Programming) to go out to meet the longest coupling subsequence A ', the B ' of definition;

The melody characteristics sequence A of given input=(a1, a2, a3 ..., am) and B=(b1, b2, b3 ..., bn);

Ci, 0=0, c0, j=0, wherein 0≤i≤m and 0≤j≤n;

Utilize the recursion equation compute matrix in the formula (7), wherein 1≤i≤m and 1≤j≤n:

c_{i, j} = \{\begin{matrix} c_{i - 1, j - 1} + 1, {ifa}_{i} \\ \max (c_{i, j - 1}, c_{i - 1, j}), else \end{matrix} - - - (7)

Finally, longest common subsequence is by Cm, the lower right corner cm of n, and n draws;

The longest coupling subsequence algorithm (LMS) calculates as follows:

Given list entries A=(a1, a2, a3 ..., am) and B=(b1, b2, b3 ..., bn);

Shaping Matrix C m, n, its unit ci, j storage subsequence (a1, a2, a3 ..., ai) and (b1, b2, b3 ..., the longest coupling sub-sequence length between bj);

Character matrix Sm, n, its unit si, j stores Matrix C m, n, Rm, the inside calculation path of n, the direction that increases calculating each time record the longest normal coupling subsequence;

According to these three matrixes of following condition initialization:

Utilize the recursion equation compute matrix Cm in the formula (8), n, Rm, n and Sm, n, wherein 1≤i≤m and 1≤j≤n:

c_{i, j} = \{\begin{matrix} c_{i - 1, j - 1} + 1, ifLD (a_{i}, b_{j}) \leq δand r_{i - 1, j - 1} \leq L \\ 1, if LC (a_{i}, b_{j}) \leq δand r_{i - 1, j - 1} > L \\ \max (c_{i, j - 1}, c_{i - 1, j}), if LC (a_{i}, b_{j}) > δand r_{i, j - 1}, r_{i - 1, j} \leq L \\ c_{i, j - 1} + 1, if LD (a_{i}, b_{j}) > δand r_{i, j - 1} \leq L < r_{i - 1, j} \\ c_{i - 1, j} + 1, if LD (a_{i}, b_{j}) > δand r_{i - 1, j} \leq L < r_{i, j - 1} \\ 0, if LD (a_{i}, b_{j}) > δand r_{i - 1, j}, r_{i, j - 1} > L \end{matrix} - - - (8)

r_{i, j} = \{\begin{matrix} 0, if LD (a_{i}, b_{j}) \leq δ \\ r_{i, j - 1} + 1, if LD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} > c_{i - 1, j} \\ r_{i - 1, j} + 1, if LD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} < c_{i - 1, j} \\ \min (r_{i, j - 1}, r_{i - 1, j}) + 1, if LD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Lang c_{i, j - 1} = c_{i - 1, j} \\ r_{i, j - 1} + 1, if LD (a_{i}, b_{j}) > δand r_{i, j - 1} \leq L < r_{i - 1, j} \\ r_{i - 1, j} + 1, if LD (a_{i}, b_{j}) > δand r_{i - 1, j} \leq L < r_{i, j - 1} \\ 0, if LD (a_{i}, b_{j}) > δand r_{i - 1, j}, r_{i, j - 1} > L \end{matrix} - - - (9)

s_{i, j} = \{\begin{matrix} S_{'}^{'}, if LD (a_{i}, b_{j}) \leq δand c_{i, j} = 1 \\ O_{'}^{'}, if LD (a_{i}, b_{j}) \leq δand c_{i, j} &NotEqual; 1 \\ R_{'}^{'}, if LD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} > c_{i - 1, j} \\ D_{'}^{'}, if LD (a_{i}, b_{j}) > δ, r_{i, j - 1}, r_{i - 1, j} \leq Land c_{i, j - 1} < c_{i - 1, j} \\ R_{'}^{'}, if LD (a_{i}, b_{j}) > δ, r_{i, j - 1} \leq r_{i - 1, j} \leq Land c_{i, j - 1} = c_{i - 1, j} \\ D_{'}^{'}, if LD (a_{i}, b_{j}) > δ, r_{i - 1, j} \leq r_{i, j - 1} \leq Land c_{i, j - 1} = c_{i - 1, j} \\ R_{'}^{'}, if LD (a_{i}, b_{j}) > δand r_{i, j - 1} \leq L < r_{i - 1, j} \\ D_{'}^{'}, if LD (a_{i}, b_{j}) > δand r_{i - 1, j} \leq L < r_{i, j - 1} \end{matrix} - - - (10)

Finally, the Output rusults of the longest coupling subsequence algorithm passes through sign matrix Sm, the path of the record of n and Cm, and the numerical evaluation that stores among the n obtains, and the length of the longest coupling subsequence is then by cm, and n directly obtains.

2. a kind of humming Computer Music search method based on the longest coupling subsequence algorithm according to claim 1 is characterized in that, the concrete steps of the longest described coupling subsequence algorithm are as follows:

(1) given input melody characteristics sequence A, B adopt the calculating of LMS algorithm to obtain subsequence A ', the B ' that similarity is the highest each other between A, the B, i.e. the longest coupling subsequence;

(2) length gauge of the length by primitive character sequence A, B and coupling subsequence A ', B ' is calculated the shared ratio of compatible portion between primitive character sequence A, the B;

(3) adopt the editing distance algorithm of real number field to calculate editing distance between A ', the B ';

(4): the characteristic sequence that each group in the retrieving is participated in coupling, carry out descending sort with the shared ratio of the compatible portion between A, the B as the first key word, editing distance carries out the ascending order arrangement as the second key word between A ', the B ', its similarity is sorted, consist of the tabulation of similarity descending.