CN101968958A

CN101968958A - Method and device for comparing audio data

Info

Publication number: CN101968958A
Application number: CN 201010530213
Authority: CN
Inventors: 蒋成林; 魏思; 胡国平; 刘丹; 胡郁; 刘庆峰
Original assignee: iFlytek Co Ltd
Current assignee: Iflytek Shanghai Technology Co ltd
Priority date: 2010-11-02
Filing date: 2010-11-02
Publication date: 2011-02-09
Anticipated expiration: 2030-11-02
Also published as: CN101968958B

Abstract

The invention discloses a method and a device for comparing audio data. The comparison method comprises the following steps of: correctly segmenting and training preset reference data so as to obtain a Gaussian mixture model (GMM) and segmenting standard data corresponding to test data by using the GMM; extracting first base frequency data corresponding to the test data; framing the standard data in each segment, extracting base frequency candidate point data corresponding to the standard data so as to obtain second base frequency data corresponding to the standard data in combination with the error between the test data and the standard data in each segment; and comparing the first base frequency data with the second base frequency data so as to obtain a comparison result. Through the embodiment of the invention, the base frequency of the standard data can be extracted correctly, so that the comparison between test audio data and standard audio data is realized.

Description

A kind of comparative approach of voice data and device

Technical field

The present invention relates to the voice data processing technology field, more particularly, relate to a kind of comparative approach and device of voice data.

Background technology

Existing singing scoring technology normally for portion singing data, is marked according to its pitch and rhythm and original singer's degree of closeness.The most application scenarios that the scoring of singing is used is: the user follows the rhythm of accompaniment and sings, points-scoring system is by analyzing recording data and accompaniment (original singer), contrast wherein is related to the parameter of the scoring performance quality of singing, and judges the quality that the user sings, and finally provides appraisal result.Here suppose that noise data is fewer in user's the recording data, adopt common fundamental frequency fetch strategy accurately to extract.

The most important thing is that the user sings the pitch curve of data and the difference between the standard pitch curve in the singing scoring, as shown in Figure 1, that is:

Dist = {&Integral;}_{t_{b}}^{t_{e}} | f (t) - g (t) | dt - - - (1)

In the following formula, f (t), g (t) represent standard fundamental frequency and singing data fundamental frequency, t respectively _b, t _eStart and end time is sung in expression respectively, and error Dist value is big more, and then score is low more, otherwise score is high more.

Usually, the acquisition of standard pitch curve can be by following two kinds of approach:

(1) with MIDI (Musical Instrument Digital Interface, musical instrument digital interface) file logging pitch information.This production method is quite high to the professional knowledge requirement of staff's music, and the workload of making is bigger, and the scoring large-scale application is unfavorable for singing;

(2) from original singer's extracting data original singer's fundamental curve.

Existing common fundamental frequency extraction algorithm comprises: time domain AMDF (Average magnitudedifference function, average magnitude difference function), autocorrelation function method etc.; Frequency domain harmonic wave Peak Intensity Method; And time frequency analysis method.Quote above-mentioned fundamental frequency extraction algorithm and handle when having neighbourhood noise, background sound music, often the scoring poor-performing of Huo Deing.

Having under the data cases of singing opera arias of MIDI or standard, singing scoring problem solves substantially.But many times, the such MIDI or the resource of singing opera arias can't obtain, and obtainable is the voice data of MP3 or MTV form.For the voice data of MP3 or MTV form, prior art solutions has: original singer's data are directly extracted fundamental frequency.

Yet, by studies confirm that of inventor: extract by the mode of pitch analysis original singer's fundamental frequency the general first fundamental frequency candidate of employing of mode mode or keep some candidates seek optimum then by the mode of dynamic programming (Dynamic Programming) fundamental curve earlier.Difference by contrast test data fundamental curve and standard pitch curve during scoring provides appraisal result.The problem of this method maximum just has accompaniment in original singer's song, occur a large amount of wrong fundamental frequency values when fundamental frequency extraction algorithm commonly used extracts fundamental frequency from have the sound accompaniment original singer easily, can't directly obtain original singer's pitch value.It is very undesirable that experiment finds only to keep the scoring performance that these fundamental frequency extraction algorithms obtain.

Summary of the invention

In view of this, the embodiment of the invention provides a kind of comparative approach and device of voice data, to realize that testing audio data to MP3 or MTV form are with the comparison between the standard audio data.

The embodiment of the invention provides a kind of comparative approach of voice data, comprising:

Obtain gauss hybrid models GMM by the reference data that presets through accurate segmentation, training, utilize described GMM that the normal data of test data correspondence is carried out segmentation;

Extract the first fundamental frequency data of corresponding described test data;

Normal data in every section is carried out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the error of test data described in every section and normal data;

The described first fundamental frequency data and the second fundamental frequency data are compared, draw comparative result.

Preferably, described normal data in every section is carried out the branch frame, extracts the fundamental frequency candidate point data of corresponding described normal data, comprising:

Described test data according to presetting the capable frame that divides of frame length and frame shift-in, is obtained the fundamental frequency average of all sampled points in every frame test data;

The fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency of the sampled point after subtracting each other is carried out the processing of Hanning window function;

Keep maximum coefficient of autocorrelation time corresponding periodic quantity in the above-mentioned sampled point,, and the fundamental frequency candidate point data of described normal data are converted to the pitch data the benchmark of choosing of described time cycle value as the fundamental frequency candidate value of described normal data.

Preferably, described initial error in conjunction with described test data and normal data obtains the second fundamental frequency data of corresponding described normal data, comprising:

The initial error of test data described in obtaining every section and normal data;

According to described initial error and fundamental frequency candidate point data, determine area of error;

Obtain the minimum value of corresponding every section test data in the described area of error, and obtain corresponding initial error, determine the second fundamental frequency data of corresponding described normal data according to the minimum value in the average error of every section test data of correspondence.

Further, described method also comprises:

When the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue, store the described second fundamental frequency data, as standard form.

Preferably, described the described first fundamental frequency data and the second fundamental frequency data are compared, draw comparative result, comprising:

Area of error by between the described first fundamental frequency data and the second fundamental frequency data draws comparative result.

Further, described method also comprises:

The mode that the comparative result of every segment data that will obtain according to described area of error is averaged by weighting obtains the comparative result between whole test data and the normal data.

A kind of comparison means of voice data comprises:

The normal data acquisition module is used for obtaining gauss hybrid models GMM by the reference data that presets through accurate segmentation, training, utilizes described GMM that the normal data of test data correspondence is carried out segmentation;

The first fundamental frequency extraction module is used to extract the first fundamental frequency data of corresponding described test data;

The second fundamental frequency extraction module, the normal data that is used for every section carries out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the initial error of test data described in every section and normal data;

Comparison module is used for the described first fundamental frequency data and the second fundamental frequency data are compared, and draws comparative result.

Preferably, the described second fundamental frequency extraction module specifically comprises:

Sampled point fundamental frequency average is obtained submodule, is used for described test data obtaining the fundamental frequency average of all sampled points in every frame test data according to presetting the capable frame that divides of frame length and frame shift-in;

The fundamental frequency processing sub is used for the fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency of the sampled point after subtracting each other is carried out the processing of Hanning window function;

The fundamental frequency candidate point obtains submodule, be used for keeping the maximum coefficient of autocorrelation time corresponding of above-mentioned sampled point periodic quantity, with the choose benchmark of described time cycle value, and the fundamental frequency candidate point data of described normal data are converted to the pitch data as the fundamental frequency candidate value of described normal data;

Initial error is obtained submodule, the initial error of test data and normal data described in being used to obtain every section;

Cost function is determined submodule, is used for determining cost function according to described initial error and fundamental frequency candidate point data;

Second fundamental frequency is determined submodule, be used for obtaining the minimum value in the average error of the corresponding every section test data of described cost function, and obtain corresponding initial error according to the minimum value in the average error of every section test data of correspondence, determine the second fundamental frequency data of corresponding described normal data.

Further, described device also comprises:

Memory module is used for storing the described second fundamental frequency data, as standard form when the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue.

Preferably, described comparison module draws comparative result by the area of error between the described first fundamental frequency data and the second fundamental frequency data.

Further, the mode that described comparison module is also averaged by weighting by the comparative result of every segment data that will obtain according to described area of error obtains the comparative result between whole test data and the normal data.

Compare with prior art, technical scheme provided by the invention is extracted relatively combining of regular and test data with the fundamental frequency of normal data, can accurately extract the fundamental frequency of normal data; Obtain the section boundaries of normal data and test data by the analytical standard data, can the subtest data and normal data between comparison;

In addition, the fundamental frequency of the normal data that at every turn obtains is stored the Automatic Optimal of the comparison template of being convenient to realize to mark.

Description of drawings

In order to be illustrated more clearly in the technical scheme of the embodiment of the invention, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 sings the synoptic diagram of difference between the pitch curve of data and the standard pitch curve for sing in scoring user relatively of prior art;

The comparative approach schematic flow sheet of a kind of voice data that Fig. 2 provides for the embodiment of the invention;

The comparative approach schematic flow sheet of the another kind of voice data that Fig. 3 provides for the embodiment of the invention;

The comparison means structural representation of a kind of voice data that Fig. 4 provides for the embodiment of the invention;

The comparison means structural representation of the another kind of voice data that Fig. 5 provides for the embodiment of the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

At first the comparative approach to a kind of voice data provided by the invention describes, and with reference to shown in Figure 2, described method comprises:

Step 101, obtain gauss hybrid models GMM through accurate segmentation, training, utilize described GMM that the normal data of test data correspondence is carried out segmentation by the reference data that presets;

In this step, described reference data can be the normal data of corresponding test data, also can be carry out being used to of presetting that the border is obtained in segmentation and according to the segmentation result training to obtain other data of GMM.

The first fundamental frequency data of step 102, the corresponding described test data of extraction;

Step 103, the normal data in every section is carried out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the initial error of test data described in every section and normal data;

Step 104, the described first fundamental frequency data and the second fundamental frequency data are compared, draw comparative result.

Technical scheme provided by the invention is extracted relatively combining of regular and test data with the fundamental frequency of normal data, can accurately extract the fundamental frequency of normal data.

Wherein, the normal data in every section is carried out the branch frame, the implementation of extracting the fundamental frequency candidate point data of corresponding described normal data specifically comprises:

In addition, in conjunction with the initial error of described test data and normal data, the specific implementation that obtains the second fundamental frequency data of corresponding described normal data comprises:

Need to prove that in that the described first fundamental frequency data and the second fundamental frequency data are compared, draw in the implementation procedure of comparative result, the embodiment of the invention draws comparative result by the area of error between the described first fundamental frequency data and the second fundamental frequency data.In addition, the mode that the comparative result of every segment data of obtaining according to described area of error can also be averaged by weighting obtains the comparative result between whole test data and the normal data.Thereby, make that the comparative result between test data and the normal data is more accurate.

In another preferred embodiment of the present invention, as shown in Figure 3, the comparative approach of above-mentioned voice data can also may further comprise the steps:

Step 105, when the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue, store the described second fundamental frequency data, as standard form.

In the embodiment of the invention, when the comparative result of the second fundamental frequency data and the first fundamental frequency data satisfies prevalue, store the Automatic Optimal of the comparison template of being convenient to realize to mark by the fundamental frequency of the normal data that will obtain at every turn.

Pass through the embodiment of the invention, can preset degree value similar between test data and the labeled data, when test data is compared with normal data, when similarity reaches this prevalue, just the second fundamental frequency data of the corresponding normal data that obtains in this process are stored, as the mark template.

For the ease of understanding, overall plan is described in detail explanation below by concrete example to technical solution of the present invention.

Audio frequency with MP3 or MTV form is an example, and the original singer is the normal data described in the embodiment of the invention, and it comprises accompaniment and voice two parts usually, and what wherein voice comprised is the thematic information of song, is that the singing scoring is needed; Accompaniment then belongs to music, and theme is played the effect of annotation, but many times inconsistent with theme.This also is the subject matter of singing and marking based on the original singer at present.

Generally, the voice data of MTV form is the two-channel data, wherein, L channel is accompaniment, and R channel is original singer's (voice adds accompaniment), can adopt methods such as spectrum subtracts, echo elimination right data being carried out filtering, elimination part accompaniment music can improve the accuracy that fundamental frequency extracts like this.

When the user wishes the singing data of oneself are marked, at first should corresponding answer user's singing data to obtain the corresponding standard data, i.e. original singer's data.For original singer's data, can accurately mark out every section border that begins and finish in the data, utilization wherein contains the former data of not being with the original singer of joining in the chorus trains the GMM model respectively.

Obtain the normal data of corresponding test data, adopt and cut apart the GMM model, just can carry out the extraction of fundamental frequency to test data and normal data respectively after the segmentation of normal data process.

For the extraction of test data fundamental frequency, can adopt the fundamental frequency extraction algorithm of general time domain autocorrelation function, be not 0 value for fundamental frequency according to following formula:

y＝12·log ₂(x/440)+69 (2)

Convert pitch to, wherein: x is an incoming frequency, and unit is a hertz; Y is the output pitch, and unit is a semitone.

For the pitch of test data output, can regular in the following manner half frequency multiplication:

To arbitrary moment t, add up these moment each 50 frames of front and back average μ of totally 100 frame fundamental frequencies, if the starting position, at this moment the Frame less than 50 before the starting position is then carved the back and is got some fundamental frequency values more; End position in like manner.Then for t moment fundamental frequency value p _t, get { p _t-24, p _t-12, p _t, p _t+ 12, p _tAmong+the 24} and μ apart from minimum, think the actual value of this pitch constantly.Only calculate the average of fundamental frequency herein with each 50 frame of statistics front and back constantly, in the specific implementation, choose for the frame number of frame before and after adding up constantly, can set according to the practical application scene, the embodiment of the invention is not done concrete qualification to this.

In leaching process for the normal data fundamental frequency, according to presetting the capable frame that divides of frame length and frame shift-in, for example: to normal data is that frame length, 10ms are the capable frame that divides of frame shift-in with 25ms, wherein the sampled point that comprises of each frame data is x (n), (n=1,2 ... N), add up the fundamental frequency average of these frame data, the fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency that subtracts each other sampled point is afterwards carried out the Hanning window function handle, these frame data are added Hanning window w (n), (n=1,2 ... N), obtain x ' (n).

According to fundamental frequency computing formula (2) formula, calculate all coefficient of autocorrelation in each frame data sampled point, keeping coefficient of autocorrelation should maximum coefficient of autocorrelation time corresponding periodic quantity when maximum, and should be worth the benchmark of choosing as the fundamental frequency candidate value of described normal data the time cycle.

For the fundamental frequency value of being withed a hook at the end, if 1 frequency multiplication point of its correspondence is not therein, and corresponding fundamental frequency value is possible fundamental frequency value, and the half frequency multiplication value that then will meet this condition adds among the candidate in the lump, is converted into semitone according to formula (2).

When singing scoring, different people plays accent when singing and understands some difference, and needs to have eliminated the influence of transferring difference in the scoring process, and the definition Δ represents to rise between test data and the labeled data difference of accent.

After test data process staging treating, to establish the i section and comprise the T frame data, the pitch of T frame test data is respectively

The fundamental frequency candidate point that has of original singer's correspondence is

K wherein _iBe i frame data fundamental frequency candidate point number }, then be defined as follows cost function:

dist = Σ_{t = 1}^{T} \min_{k &Element; [1, k_{i}]} (| p_{i, t, k}^{cand} - p_{i, t}^{s} - Δ |) / N - - - (3)

Wherein, N is dist when calculating, and all original singer's fundamental frequency candidate points, test data fundamental frequency value all are not the frame number of 0 data.

For each Δ _j=0.1j, { j ∈ Z ,-120≤j≤120} calculate the average error of i segment data respectively

And frame number N _{I, j}If i ≠ 1, then all data are transferred the difference Δ owing to rising before the i segment data _jAnd the average error that produces is

Frame number is

Then the cost function in (3) formula should upgrade in the following manner:

\{\begin{matrix} {dist}_{i + 1, j}^{pre} = ({dist}_{i, j}^{pre} \cdot N_{i, j}^{pre} + {dist}_{i, j}^{sent} \cdot N_{i, j}) / (N_{i, j}^{pre} + N_{i, j}) \\ N_{i + 1, j}^{pre} = N_{i, j}^{pre} + N_{i, j} \\ {dist}_{i, j}^{sent} = {dist}_{i + 1, j}^{pre} \end{matrix} - - - (4)

Wherein,

A hour corresponding Δ is designated as

Simultaneously, obtaining final fundamental frequency is shown below:

p_{i, t} = p_{i, t, k}^{cand}, k = \arg (\min_{k &Element; [1, k_{i}]} (| p_{i, t, k}^{cand} - p_{i, t}^{s} - Δ_{i}^{best})) - - - (5)

In the embodiment of the invention, final scoring process is to provide final scoring by the area of error between the fundamental frequency of fundamental frequency that calculates test data and normal data, promptly

dist = Σ_{t}^{T_{i}} \min (| p_{i, t}^{s} - p_{i, t}^{t} - Δ_{i}^{best} |, MAXCOST) / N_{i} - - - (6)

Wherein,

The fundamental frequency of the normal data that obtains is extracted in expression by fundamental frequency; Participate in scoring fundamental frequency value constantly for testing t of several i section; Promptly transfer the poor error that produces owing to rising between i section test data and the normal data, obtain based on many candidates fundamental frequency fetch strategy by the front; N all is not 0 frame number for fundamental frequency in all test datas and the normal data; MAXCOST represents the error upper limit, can preestablish.

On the basis of the extraction fundamental frequency of beginning determine to(for) each section test data and end position, every section beginning and end position are respectively Seg in the accurate data of bidding _Start, Seg _End, the beginning and the end position of the fundamental frequency of the test data correspondence that the tester provides are respectively P _Start, P _End, then the comparison score of i section test data is carried out regular according to following formula:

{SentScore}_{i} = (Σ_{t = 1}^{T_{i}} | p_{i, t}^{s} - p_{i, t}^{t} - Δ_{i}^{best} | + MAXCOST * K_{i}) / (N_{i} + K_{i}) - - - (7)

In the following formula, K _iRepresent to surpass 10% frame number, T in the part that each frame of every section test data begins not sing with end position _iFor fundamental frequency in normal data in every section and the test data all is not 0 frame number.

In the actual mechanical process, after the fundamental frequency candidate point that obtains normal data, need carry out translation up and down to the fundamental frequency of test data correspondence, after translation and the error minimum between the fundamental frequency candidate point of normal data, promptly eliminate by rising between test data and the labeled data and transfer the error delta that produces, make for the every segment data in the test data, all corresponding to transferring standard together.Area of error between the fundamental frequency by calculating test data and the fundamental frequency of normal data provides final appraisal result.

In order further to improve the degree of accuracy of data contrast, can be weighted according to the physical length of normal data for the scoring of entire segment and to average, promptly

Tonescore = \frac{Σ_{i = 1}^{s} ({LRC}_{end} - {LRC}_{start}) * {SentScore}_{i}}{Σ_{i = 1}^{s} ({LRC}_{end} - {LRC}_{start})} - - - (8)

Then the appraisal result of final entire segment can obtain according to following formula:

FinalScore＝a·Tonescore+b (9)

A wherein, b is respectively linear regression coeffficient.

In the foregoing description, can real-time storage the second fundamental frequency data of corresponding described normal data, thereby reach the purpose of renewal, optimizing criterion data fundamental curve.

In the embodiment of the invention, when test data is compared, can carry out choosing of normal data fundamental curve, way is as follows:

For every section in test data content, if the normal data fundamental curve of choosing according to system, the score that obtains test data exceeds certain predefined thresholding, the normal data fundamental curve Template that will utilize in the time of then will marking to this test data preserves, as the reference template of follow-up scoring.

For the new test data Test of portion _Cur, if system has stored a same test data Test _BestCorresponding normal data fundamental frequency template Template _t, then for Test _CurIn each section test data, if adopt appraisal result that above-mentioned many candidates fundamental frequency fetch strategy obtains not as Test _BestThe middle corresponding appraisal result that obtains then adopts Test _BestThe middle normal data fundamental frequency template of using is marked; Otherwise, then keep by current many candidates fundamental frequency and extract the appraisal result that the fundamental curve that obtains obtains as template, if score is higher than the thresholding of predesignating, then the fundamental curve that will currently obtain is stored as normal data fundamental frequency template.

The comparative approach embodiment of corresponding above-mentioned voice data, the present invention also provides a kind of comparison means of voice data, and as shown in Figure 4, described device comprises:

Normal data acquisition module 401 is used for obtaining gauss hybrid models GMM by the reference data that presets through accurate segmentation, training, utilizes described GMM that the normal data of test data correspondence is carried out segmentation;

The first fundamental frequency extraction module 402 is used to extract the first fundamental frequency data of corresponding described test data;

The second fundamental frequency extraction module 403, the normal data that is used for every section carries out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the initial error of test data described in every section and normal data;

Comparison module 404 is used for the described first fundamental frequency data and the second fundamental frequency data are compared, and draws comparative result.

The comparison means of voice data provided by the invention extracts relatively combining of regular and test data with the fundamental frequency of normal data, can accurately extract the fundamental frequency of normal data.

Need to prove that the above-mentioned second fundamental frequency extraction module specifically can comprise:

Cost function is determined submodule, is used for determining area of error according to described initial error and fundamental frequency candidate point data;

Second fundamental frequency is determined submodule, be used for obtaining the minimum value of the corresponding every section test data of described area of error, and obtain corresponding initial error according to the minimum value in the average error of every section test data of correspondence, determine the second fundamental frequency data of corresponding described normal data.

In the comparison means embodiment of another voice data of the present invention, as shown in Figure 5, described device can also comprise:

Fundamental frequency by the normal data that will obtain is at every turn stored, the Automatic Optimal of the comparison template of being convenient to realize to mark.

In the specific implementation, described comparison module can draw comparative result by the area of error between the described first fundamental frequency data and the second fundamental frequency data.

In order further to improve the degree of accuracy of data contrast, the mode that described comparison module is also averaged by weighting by the comparative result of every segment data that will obtain according to described area of error obtains the comparative result between whole test data and the normal data.

For device embodiment, because it is substantially corresponding to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.Device embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, promptly can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select wherein some or all of module to realize the purpose of present embodiment scheme according to the actual needs.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.

One of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-OnlyMemory, ROM) or at random store memory body (Random Access Memory, RAM) etc.

To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be conspicuous concerning those skilled in the art, and defined herein General Principle can realize under the situation of the spirit or scope that do not break away from the embodiment of the invention in other embodiments.Therefore, the embodiment of the invention will can not be restricted to these embodiment shown in this article, but will meet and principle disclosed herein and features of novelty the wideest corresponding to scope.

Claims

1. the comparative approach of a voice data is characterized in that, described method comprises:

2. the comparative approach of voice data according to claim 1 is characterized in that, described normal data in every section is carried out the branch frame, extracts the fundamental frequency candidate point data of corresponding described normal data, comprising:

3. the comparative approach of voice data according to claim 1 is characterized in that, described initial error in conjunction with described test data and normal data obtains the second fundamental frequency data of corresponding described normal data, comprising:

4. the comparative approach of voice data according to claim 4 is characterized in that, described method also comprises:

5. the comparative approach of voice data according to claim 1 is characterized in that, described the described first fundamental frequency data and the second fundamental frequency data is compared, and draws comparative result, comprising:

6. the comparative approach of voice data according to claim 5 is characterized in that, described method also comprises:

7. the comparison means of a voice data is characterized in that, described device comprises:

8. the comparison means of voice data according to claim 7 is characterized in that, the described second fundamental frequency extraction module specifically comprises:

9. the comparison means of voice data according to claim 7 is characterized in that, described device also comprises:

10. the comparison means of voice data according to claim 7 is characterized in that, described comparison module draws comparative result by the area of error between the described first fundamental frequency data and the second fundamental frequency data.

11. the comparison means of voice data according to claim 10, it is characterized in that, the mode that described comparison module is also averaged by weighting by the comparative result of every segment data that will obtain according to described area of error obtains the comparative result between whole test data and the normal data.