CN107256710A

CN107256710A - A kind of humming melody recognition methods based on dynamic time warp algorithm

Info

Publication number: CN107256710A
Application number: CN201710648569.2A
Authority: CN
Inventors: 郑丽敏; 程国栋; 杨璐; 田立军
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2017-08-01
Filing date: 2017-08-01
Publication date: 2017-10-17

Abstract

The invention discloses a kind of humming melody recognition methods based on dynamic time warp algorithm, realize and the humming melody of humming person is fast and accurately recognized.Humming melody audio signal is pre-processed first, noise reduction process is carried out to voice signal, the mixed and disorderly sound in signal is removed.Then preemphasis and adding window sub-frame processing are carried out to the signal after noise reduction, obtains high-quality voice signal.The operation such as end-point detection and the extraction of MFCC coefficients is carried out to voice signal, the parameter extraction for representing voice signal substantive characteristics is come out to the characteristic sequence for obtaining humming input melody.Characteristic sequence will be obtained and the characteristic sequence of the song in library is contrasted, a part of song is deleted according to melody characteristicses and editing distance.Using improved dynamic time warp algorithm, the characteristic sequence of remaining song is accurately matched with target signature sequence, candidate song is obtained, realize the identification of humming melody.

Description

A kind of humming melody recognition methods based on dynamic time warp algorithm

Technical field

The present invention relates to a kind of humming melody recognition methods based on dynamic time warp algorithm, belong to speech recognition neck Domain.

Background technology

In content-based music information retrieval research, humming melody recognition methods is its core algorithm and research weight Point, the performance of recognition methods has direct influence to recognition efficiency and recognition result.It is unfamiliar with to song title or singer Situation, by hum melody recognize we can with it is very convenient efficiently according to individual humming inquire respective songs.Simultaneously It also has quite varied application in terms of the tune analysis of song and the performance level marking of singer.Know in humming melody In other method, we should consider to hum note increase and decrease and pitch bias phenomenon in voice, consider that user can be in song again Bent arbitrary period is hummed.Therefore use appropriate recognition strategy and matching process particularly significant.

The content of the invention

It is an object of the invention to provide a kind of humming melody recognition methods based on dynamic time warp algorithm, it can have The voice messaging that effect ground is hummed according to humming person identifies the information of song.The present invention uses following technical scheme：

A kind of humming melody recognition methods based on dynamic time warp algorithm, comprises the following steps：

(1) voice messaging is pre-processed.The voice messaging of humming is subjected to denoising, the operation such as preemphasis and adding window framing is carried The quality of high voice signal, making the HFS of signal becomes flat, and voice signal is integrally steady, is easy to follow-up analysis；

(2) feature extraction.End-point detection is carried out to the voice signal treated by above-mentioned steps and MFCC coefficients are extracted Deng operation, the parameter extraction for representing voice signal substantive characteristics can be come out, obtain the characteristic sequence of humming input melody；

(3) the quick screening based on melody characteristicses.This method is used as feature melody according to half beat of song.Basis first The pitch of song, which rises and falls, calculates the descant and contrabass half beat of a song.With melody to be measured half in delete target music libraries Beat differs larger song.By this step, a part of song is quickly deleted, melody identification process is improved；

(4) the quick screening based on editing distance.The characteristic sequence for inputting melody will be hummed first poor according to pitch, conversion Pass through into character string sequence, with target music libraries and remaining song progress calculating editor is screened based on melody characteristicses Rapid matching Distance.Reject the song that distance differs larger；

(5) the accurate identification based on improved dynamic time warp algorithm.In humming identification, it will simply can not groan Sing input melody characteristic sequence be compared with target template sequence because humming person there may be in humming it is various Humming mistake, such as addition note deletes note etc., therefore it is particularly significant to carry out time planning to template.Dynamic time warp Algorithm is by constantly calculating the distance of two vectors to find optimal coupling path, so two obtained vectors are cumulative distances Minimum regular integer, it ensure that there are maximum acoustics similar features between them.But dynamic time warp algorithm is deposited It is slow in retrieval time, the problems such as required memory space is big, therefore this method relaxes from end points regarding to the issue above, across sentence retrieval, Several aspects such as cost function are improved to dynamic time warp algorithm.Realized by the dynamic time warp algorithm after improvement The accurate identification of humming melody.

Advantages of the present invention：

1st, invention introduces across sentence retrieval and tail end point relaxation, the time required to reducing dynamic time warp algorithm 20%.

2nd, the present invention introduces the feature of the duration of a sound in the cost function calculation of dynamic time warp algorithm, hit rate is existed 5% is improved on the basis of original.

3rd, the present invention do not limit user humming mode or rhythm it is less accurate in the case of, have compared to former algorithm More preferable robustness.

4th, the present invention is retrieved for traditional dynamic time warp algorithm across sentence, is changed in terms of end points relaxation Enter, and otonaga features are introduced in the cost function calculation of algorithm, the property of humming melody recognizer is improved on the whole Can, improve recognition efficiency.

Brief description of the drawings

Fig. 1 is flow chart of the method for the present invention.

Fig. 2 is dynamic time warp algorithm principle figure of the present invention.

Fig. 3 grid schematic diagrames for needed for being calculated the present invention.

Fig. 4 is tail end point relaxation path profile of the present invention.

Embodiment

Further to illustrate the present invention to reach the technological means and effect that predetermined goal of the invention is taken, below in conjunction with Accompanying drawing and preferred embodiment, to according to its embodiment proposed by the present invention, structure, feature and its effect, describing in detail As after.

As shown in figure 1, a kind of humming melody recognition methods based on dynamic time warp algorithm, comprises the following steps：

Quick screening principle of the above-mentioned steps (4) based on editing distance is as follows：

The characteristic sequence for inputting melody will be hummed first poor according to pitch, be converted into including the word of (E, U, X, S, D, B, T) Pass through in symbol string sequence, with target musical database and quickly screen remaining song calculating editing distance based on melody characteristicses.Compile It is a kind of method based on Dynamic Programming Idea to collect distance, and its principle is characteristic sequence x and characteristic sequence y editing distance, fixed Justice is the minimum basic operation number needed for transforming to y from x, and basic operation here includes：

(a) character in x is replaced to be replaced by correspondence character string in y；

(b) character is inserted into x in insertion y, x length plus 1；

(c) character in x is deleted, is that x length subtracts 1.

Specific formula for calculation is as follows：

ED (a (i), b (j))=ED (a, b) if x=y

ED (a (i), b (j))=min (ED (a (i), b (j))+2, ED (a (i-1), b (j))+1, ED (a (i), b (j-1))+ 1)

if x≠y

Wherein a, b are two feature strings, and ED examinations calculate a, and b subtracts the matrix of cumulative distance.

Accurate recognition principle of the above-mentioned steps (5) based on improved dynamic time warp algorithm is as follows：

Dynamic time warp algorithm is by constantly calculating the distance of two vectors to find optimal coupling path, so obtaining To two vectors be the minimum regular integer of cumulative distance, it ensure that there are maximum acoustics similar features between them.It is dynamic The state time, flexible algorithm principle was as shown in Figure 2.But because many actual mesh points are to be not required in actual matching process Reach, thus Fig. 2 rhombus be a kind of limitation of path mode, the matching distance corresponding to lattice point so outside the rhombus is It need not calculate, in addition also It is not necessary to preserve corresponding data, it is possible to reduce required memory space.Colleague, power is entered One step reduces required amount of calculation, and the matching primitives needed for each lattice point have only used three grids of previous column, such as Fig. 3 institutes Show.It is assumed here that there is two characteristic sequences to carry out dynamic time warp algorithm calculating, characteristic sequence to be matched is

X={ x₁, x₂, x_n, template characteristic sequence is y={ y₁, y₂, y_n, d is cost function, specific to calculate public Formula is as follows：

D (i, j)=min (D (i-2, j-1)+d_i-2, j-1, D (i-1, j-1)+d_i-1,j-1, D (i-1, j-2)+d_i-1, j-2)

D is cost Korean style in above-mentioned formula, and specific formula for calculation is as follows：

d_i-1, j-1=abs (x_i-y_j)

d_i-2,_j-1=abs (x_i-1+x_i-y_j)+c₁

d_i-1, j-2=abs (y_j-1+y_j-x_i)+c₂

Wherein, D is the matrix of accumulation distance between calculating X and Y, c₁, c₂For balance factor, it is therefore an objective to which balance insertion is deleted The cost brought except note.

After two templates are matched, i.e., the minimum value from last column in distance accumulation matrix is recalled, from The minimum value is once found forward accumulated each time before 3 points of minimum value, when returning to starting match point, then produce One best match route.And the minimum value of last column of matrix is matched, it is exactly the distance of two templates.

Traditional dynamic time warp algorithm is all voice to be carried out head to head, matching of the tail to tail, but user hums Melody there may be the situation of note insertion or missing, therefore clear and definite tail point can not be determined.This paper presents tail end The method of point relaxation, concrete principle is as follows：

If P is that head-end is unknown, the Characteristic Number that N includes for humming input melody characteristicses sequence, w₁, w₂Respectively in length Allow to insert note in the case of spending by N and delete the number of errors of note.In the case where head-end is determined, tail point area Between be { (P+N-w₂), (P+N+w₁), increase path restrictive condition, calculate (P+N-w₂) and (P+N+w₁) two between position special Levy between vector distance, selection minimum value and record, tail end point relaxation path such as Fig. 4.

The dynamic time warp algorithm for being conventionally used to humming identification only considered the feature of pitch, and have ignored the spy of the duration of a sound Levy.This method is in order to improve the discrimination of humming identification, while being it is possible that humming section in view of humming person humming song Play when the song rhythm in java standard library is fast or slow, the duration of a sound is introduced to the calculating of cost function.

If the characteristic sequence of humming input melody is X={ (tx₁, ty₁)(tx₂, ty₂)·(tx_m, ty_m), target database In song features sequence be Y={ (rx₁, ry₁)(rx₂, ry₂)·(rx_n, ry_n), tx_i, rx_jFor corresponding pitch difference sequence, ty_i, ry_jCompare sequence for the corresponding duration of a sound.Improved cost function calculation is as follows：

d_i-1, j-1=u*abs (tx_i-rx_j)+(1-u)*abs(ty_i-ry_j)*km

d_i-2, j-1=u*abs (tx_i-1+tx_i-rx_j)+(1-u)*abs((ty_i*ty_i-1)/(1+ty_i-1)-ry_j)*km+c₃

d_i-1, j-2=u*abs (rx_j-1+rx_j–tx_i)+(1-u)*abs((ry_j*ry_j-1)/(1+ry_j-1)-ry_i)*km+c₄

Wherein c₃, c₄For balance factor, u is the weights introduced.This method finds that pitch parameters are more defined than otonaga features Really, so setting u>0.5.Km be average pitch difference with the average duration of a sound than ratio.

Embodiments described above is only that the preferred embodiment of the present invention is described, not to the present invention's Scope is defined, under the premise of design spirit of the present invention is not departed from, and this area ordinary skill technical staff is to the present invention program In the various modifications made and improvement, the protection domain that claims of the present invention determination all should be fallen into.

Claims

1. a kind of humming melody recognition methods based on dynamic time warp algorithm, it is characterised in that comprise the following steps：

(1) voice messaging is pre-processed.The voice messaging of humming is subjected to denoising, the operation such as preemphasis and adding window framing improves language The quality of message number, making the HFS of signal becomes flat, and voice signal is integrally steady, is easy to follow-up analysis；

(2) feature extraction.The behaviour such as end-point detection and the extraction of MFCC coefficients is carried out to the voice signal treated by above-mentioned steps Make, the parameter extraction for representing voice signal substantive characteristics can be come out, obtain the characteristic sequence of humming input melody；

(3) the quick screening based on melody characteristicses.This method is used as feature melody according to half beat of song.First according to song Pitch rise and fall and calculate the descant and contrabass half beat of a first song.With the beat of melody half to be measured in delete target music libraries The larger song of difference.By this step, a part of song is quickly deleted, melody identification process is improved；

(4) the quick screening based on editing distance.The characteristic sequence for inputting melody will be hummed first poor according to pitch, be converted into word Accord with pass through in string sequence, with target music libraries based on melody characteristicses Rapid matching screen remaining song calculate editor away from From.Reject the song that distance differs larger；

(5) the accurate identification based on improved dynamic time warp algorithm.In humming identification, it will simply can not hum defeated The characteristic sequence and target template sequence for entering melody are compared, because humming person there may be various groan in humming Mistake is sung, such as addition note, delete note etc., therefore it is particularly significant to carry out time planning to template.Dynamic time warp algorithm By constantly calculating the distance of two vectors to find optimal coupling path, so two obtained vectors are that cumulative distance is minimum Regular integer, it ensure that there are maximum acoustics similar features between them.But there is inspection in dynamic time warp algorithm The rope time is slow, the problems such as required memory space is big, therefore this method relaxes from end points regarding to the issue above, across sentence retrieval, cost Several aspects such as function are improved to dynamic time warp algorithm.Realized and groaned by the dynamic time warp algorithm after improvement Sing the accurate identification of melody.

2. a kind of humming melody recognition methods based on dynamic time warp algorithm according to claim 1, its feature exists The rapid screening method based on editing distance in the step (4) includes following principle：

The characteristic sequence that (2a) will hum input melody first is poor according to pitch, is converted into including the word of (E, U, X, S, D, B, T) Pass through in symbol string sequence, with target musical database and quickly screen remaining song calculating editing distance based on melody characteristicses.Compile It is a kind of method based on Dynamic Programming Idea to collect distance, and its principle is characteristic sequence x and characteristic sequence y editing distance, fixed Justice is the minimum basic operation number needed for transforming to y from x, and basic operation here includes：

(b) character is inserted into x in insertion y, x length plus 1；

(c) character in x is deleted, is that x length subtracts 1.

(2b) specific formula for calculation is as follows：

ED (a (i), b (j))=ED (a, b) if x=y

ED (a (i), b (j))=min (ED (a (i), b (j))+2, ED (a (i-1), b (j))+1, ED (a (i), b (j-1))+1)

if x≠y

3. a kind of humming melody recognition methods based on dynamic time warp algorithm according to claim 1, its feature exists The accurate identification based on improved dynamic time warp algorithm in the step (5) includes following principle steps：

(3a) traditional dynamic time warp algorithm is all voice to be carried out head to head, matching of the tail to tail, but user hums Melody there may be the situation of note insertion or missing, therefore clear and definite tail point can not be determined.This paper presents tail end The method of point relaxation, concrete principle is as follows：

If P is that head-end is unknown, the Characteristic Number that N includes for humming input melody characteristicses sequence, w₁, w₂It is N respectively in length In the case of allow insert note and delete note number of errors.In the case where head-end is determined, tail point interval is {(P+N-w₂), (P+N+w₁), increase path restrictive condition, calculate (P+N-w₂) and (P+N+w₁) two features between position to Span between selection from minimum value and recording, tail end point relaxation path such as Fig. 4.

The dynamic time warp algorithm that (3b) is conventionally used to humming identification only considered the feature of pitch, and have ignored the spy of the duration of a sound Levy.This method is in order to improve the discrimination of humming identification, while being it is possible that humming section in view of humming person humming song Play when the song rhythm in java standard library is fast or slow, the duration of a sound is introduced to the calculating of cost function.

If the characteristic sequence of humming input melody is X={ (tx₁, ty₁)(tx₂, ty₂)…(tx_m, ty_m), in target database Song features sequence is Y={ (rx₁, ry₁)(rx₂, ry₂)…(rx_n, ry_n), tx_i, rx_jFor corresponding pitch difference sequence, ty_i, ry_jCompare sequence for the corresponding duration of a sound.Improved cost function calculation is as follows：

d_{I-1, j-1}=u*abs (tx_i-rx_j)+(1-u)*abs(ty_i-ry_j)*km

d_{I-2, j-1}=u*abs (tx_i-1+tx_i-rx_j)+(1-u)*abs((ty_i*ty_i-1)/(1+ty_i-1)-ry_j)*km+c₃

d_{I-1, j-2}=u*abs (rx_j-1+rx_j–tx_i)+(1-u)*abs((ry_j*ry_j-1)/(1+ry_j-1)-ry_i)

*km+c₄

Wherein c₃, c₄For balance factor, u is the weights introduced.This method finds that pitch parameters are more more accurate than otonaga features, institute To set u>0.5.Km be average pitch difference with the average duration of a sound than ratio.