CN101968958A - Method and device for comparing audio data - Google Patents

Method and device for comparing audio data Download PDF

Info

Publication number
CN101968958A
CN101968958A CN 201010530213 CN201010530213A CN101968958A CN 101968958 A CN101968958 A CN 101968958A CN 201010530213 CN201010530213 CN 201010530213 CN 201010530213 A CN201010530213 A CN 201010530213A CN 101968958 A CN101968958 A CN 101968958A
Authority
CN
China
Prior art keywords
data
fundamental frequency
error
normal data
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010530213
Other languages
Chinese (zh)
Other versions
CN101968958B (en
Inventor
蒋成林
魏思
胡国平
刘丹
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Shanghai Technology Co ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201010530213A priority Critical patent/CN101968958B/en
Publication of CN101968958A publication Critical patent/CN101968958A/en
Application granted granted Critical
Publication of CN101968958B publication Critical patent/CN101968958B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

The invention discloses a method and a device for comparing audio data. The comparison method comprises the following steps of: correctly segmenting and training preset reference data so as to obtain a Gaussian mixture model (GMM) and segmenting standard data corresponding to test data by using the GMM; extracting first base frequency data corresponding to the test data; framing the standard data in each segment, extracting base frequency candidate point data corresponding to the standard data so as to obtain second base frequency data corresponding to the standard data in combination with the error between the test data and the standard data in each segment; and comparing the first base frequency data with the second base frequency data so as to obtain a comparison result. Through the embodiment of the invention, the base frequency of the standard data can be extracted correctly, so that the comparison between test audio data and standard audio data is realized.

Description

A kind of comparative approach of voice data and device
Technical field
The present invention relates to the voice data processing technology field, more particularly, relate to a kind of comparative approach and device of voice data.
Background technology
Existing singing scoring technology normally for portion singing data, is marked according to its pitch and rhythm and original singer's degree of closeness.The most application scenarios that the scoring of singing is used is: the user follows the rhythm of accompaniment and sings, points-scoring system is by analyzing recording data and accompaniment (original singer), contrast wherein is related to the parameter of the scoring performance quality of singing, and judges the quality that the user sings, and finally provides appraisal result.Here suppose that noise data is fewer in user's the recording data, adopt common fundamental frequency fetch strategy accurately to extract.
The most important thing is that the user sings the pitch curve of data and the difference between the standard pitch curve in the singing scoring, as shown in Figure 1, that is:
Dist = ∫ t b t e | f ( t ) - g ( t ) | dt - - - ( 1 )
In the following formula, f (t), g (t) represent standard fundamental frequency and singing data fundamental frequency, t respectively b, t eStart and end time is sung in expression respectively, and error Dist value is big more, and then score is low more, otherwise score is high more.
Usually, the acquisition of standard pitch curve can be by following two kinds of approach:
(1) with MIDI (Musical Instrument Digital Interface, musical instrument digital interface) file logging pitch information.This production method is quite high to the professional knowledge requirement of staff's music, and the workload of making is bigger, and the scoring large-scale application is unfavorable for singing;
(2) from original singer's extracting data original singer's fundamental curve.
Existing common fundamental frequency extraction algorithm comprises: time domain AMDF (Average magnitudedifference function, average magnitude difference function), autocorrelation function method etc.; Frequency domain harmonic wave Peak Intensity Method; And time frequency analysis method.Quote above-mentioned fundamental frequency extraction algorithm and handle when having neighbourhood noise, background sound music, often the scoring poor-performing of Huo Deing.
Having under the data cases of singing opera arias of MIDI or standard, singing scoring problem solves substantially.But many times, the such MIDI or the resource of singing opera arias can't obtain, and obtainable is the voice data of MP3 or MTV form.For the voice data of MP3 or MTV form, prior art solutions has: original singer's data are directly extracted fundamental frequency.
Yet, by studies confirm that of inventor: extract by the mode of pitch analysis original singer's fundamental frequency the general first fundamental frequency candidate of employing of mode mode or keep some candidates seek optimum then by the mode of dynamic programming (Dynamic Programming) fundamental curve earlier.Difference by contrast test data fundamental curve and standard pitch curve during scoring provides appraisal result.The problem of this method maximum just has accompaniment in original singer's song, occur a large amount of wrong fundamental frequency values when fundamental frequency extraction algorithm commonly used extracts fundamental frequency from have the sound accompaniment original singer easily, can't directly obtain original singer's pitch value.It is very undesirable that experiment finds only to keep the scoring performance that these fundamental frequency extraction algorithms obtain.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of comparative approach and device of voice data, to realize that testing audio data to MP3 or MTV form are with the comparison between the standard audio data.
The embodiment of the invention provides a kind of comparative approach of voice data, comprising:
Obtain gauss hybrid models GMM by the reference data that presets through accurate segmentation, training, utilize described GMM that the normal data of test data correspondence is carried out segmentation;
Extract the first fundamental frequency data of corresponding described test data;
Normal data in every section is carried out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the error of test data described in every section and normal data;
The described first fundamental frequency data and the second fundamental frequency data are compared, draw comparative result.
Preferably, described normal data in every section is carried out the branch frame, extracts the fundamental frequency candidate point data of corresponding described normal data, comprising:
Described test data according to presetting the capable frame that divides of frame length and frame shift-in, is obtained the fundamental frequency average of all sampled points in every frame test data;
The fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency of the sampled point after subtracting each other is carried out the processing of Hanning window function;
Keep maximum coefficient of autocorrelation time corresponding periodic quantity in the above-mentioned sampled point,, and the fundamental frequency candidate point data of described normal data are converted to the pitch data the benchmark of choosing of described time cycle value as the fundamental frequency candidate value of described normal data.
Preferably, described initial error in conjunction with described test data and normal data obtains the second fundamental frequency data of corresponding described normal data, comprising:
The initial error of test data described in obtaining every section and normal data;
According to described initial error and fundamental frequency candidate point data, determine area of error;
Obtain the minimum value of corresponding every section test data in the described area of error, and obtain corresponding initial error, determine the second fundamental frequency data of corresponding described normal data according to the minimum value in the average error of every section test data of correspondence.
Further, described method also comprises:
When the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue, store the described second fundamental frequency data, as standard form.
Preferably, described the described first fundamental frequency data and the second fundamental frequency data are compared, draw comparative result, comprising:
Area of error by between the described first fundamental frequency data and the second fundamental frequency data draws comparative result.
Further, described method also comprises:
The mode that the comparative result of every segment data that will obtain according to described area of error is averaged by weighting obtains the comparative result between whole test data and the normal data.
A kind of comparison means of voice data comprises:
The normal data acquisition module is used for obtaining gauss hybrid models GMM by the reference data that presets through accurate segmentation, training, utilizes described GMM that the normal data of test data correspondence is carried out segmentation;
The first fundamental frequency extraction module is used to extract the first fundamental frequency data of corresponding described test data;
The second fundamental frequency extraction module, the normal data that is used for every section carries out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the initial error of test data described in every section and normal data;
Comparison module is used for the described first fundamental frequency data and the second fundamental frequency data are compared, and draws comparative result.
Preferably, the described second fundamental frequency extraction module specifically comprises:
Sampled point fundamental frequency average is obtained submodule, is used for described test data obtaining the fundamental frequency average of all sampled points in every frame test data according to presetting the capable frame that divides of frame length and frame shift-in;
The fundamental frequency processing sub is used for the fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency of the sampled point after subtracting each other is carried out the processing of Hanning window function;
The fundamental frequency candidate point obtains submodule, be used for keeping the maximum coefficient of autocorrelation time corresponding of above-mentioned sampled point periodic quantity, with the choose benchmark of described time cycle value, and the fundamental frequency candidate point data of described normal data are converted to the pitch data as the fundamental frequency candidate value of described normal data;
Initial error is obtained submodule, the initial error of test data and normal data described in being used to obtain every section;
Cost function is determined submodule, is used for determining cost function according to described initial error and fundamental frequency candidate point data;
Second fundamental frequency is determined submodule, be used for obtaining the minimum value in the average error of the corresponding every section test data of described cost function, and obtain corresponding initial error according to the minimum value in the average error of every section test data of correspondence, determine the second fundamental frequency data of corresponding described normal data.
Further, described device also comprises:
Memory module is used for storing the described second fundamental frequency data, as standard form when the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue.
Preferably, described comparison module draws comparative result by the area of error between the described first fundamental frequency data and the second fundamental frequency data.
Further, the mode that described comparison module is also averaged by weighting by the comparative result of every segment data that will obtain according to described area of error obtains the comparative result between whole test data and the normal data.
Compare with prior art, technical scheme provided by the invention is extracted relatively combining of regular and test data with the fundamental frequency of normal data, can accurately extract the fundamental frequency of normal data; Obtain the section boundaries of normal data and test data by the analytical standard data, can the subtest data and normal data between comparison;
In addition, the fundamental frequency of the normal data that at every turn obtains is stored the Automatic Optimal of the comparison template of being convenient to realize to mark.
Description of drawings
In order to be illustrated more clearly in the technical scheme of the embodiment of the invention, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 sings the synoptic diagram of difference between the pitch curve of data and the standard pitch curve for sing in scoring user relatively of prior art;
The comparative approach schematic flow sheet of a kind of voice data that Fig. 2 provides for the embodiment of the invention;
The comparative approach schematic flow sheet of the another kind of voice data that Fig. 3 provides for the embodiment of the invention;
The comparison means structural representation of a kind of voice data that Fig. 4 provides for the embodiment of the invention;
The comparison means structural representation of the another kind of voice data that Fig. 5 provides for the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
At first the comparative approach to a kind of voice data provided by the invention describes, and with reference to shown in Figure 2, described method comprises:
Step 101, obtain gauss hybrid models GMM through accurate segmentation, training, utilize described GMM that the normal data of test data correspondence is carried out segmentation by the reference data that presets;
In this step, described reference data can be the normal data of corresponding test data, also can be carry out being used to of presetting that the border is obtained in segmentation and according to the segmentation result training to obtain other data of GMM.
The first fundamental frequency data of step 102, the corresponding described test data of extraction;
Step 103, the normal data in every section is carried out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the initial error of test data described in every section and normal data;
Step 104, the described first fundamental frequency data and the second fundamental frequency data are compared, draw comparative result.
Technical scheme provided by the invention is extracted relatively combining of regular and test data with the fundamental frequency of normal data, can accurately extract the fundamental frequency of normal data.
Wherein, the normal data in every section is carried out the branch frame, the implementation of extracting the fundamental frequency candidate point data of corresponding described normal data specifically comprises:
Described test data according to presetting the capable frame that divides of frame length and frame shift-in, is obtained the fundamental frequency average of all sampled points in every frame test data;
The fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency of the sampled point after subtracting each other is carried out the processing of Hanning window function;
Keep maximum coefficient of autocorrelation time corresponding periodic quantity in the above-mentioned sampled point,, and the fundamental frequency candidate point data of described normal data are converted to the pitch data the benchmark of choosing of described time cycle value as the fundamental frequency candidate value of described normal data.
In addition, in conjunction with the initial error of described test data and normal data, the specific implementation that obtains the second fundamental frequency data of corresponding described normal data comprises:
The initial error of test data described in obtaining every section and normal data;
According to described initial error and fundamental frequency candidate point data, determine area of error;
Obtain the minimum value of corresponding every section test data in the described area of error, and obtain corresponding initial error, determine the second fundamental frequency data of corresponding described normal data according to the minimum value in the average error of every section test data of correspondence.
Need to prove that in that the described first fundamental frequency data and the second fundamental frequency data are compared, draw in the implementation procedure of comparative result, the embodiment of the invention draws comparative result by the area of error between the described first fundamental frequency data and the second fundamental frequency data.In addition, the mode that the comparative result of every segment data of obtaining according to described area of error can also be averaged by weighting obtains the comparative result between whole test data and the normal data.Thereby, make that the comparative result between test data and the normal data is more accurate.
In another preferred embodiment of the present invention, as shown in Figure 3, the comparative approach of above-mentioned voice data can also may further comprise the steps:
Step 105, when the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue, store the described second fundamental frequency data, as standard form.
In the embodiment of the invention, when the comparative result of the second fundamental frequency data and the first fundamental frequency data satisfies prevalue, store the Automatic Optimal of the comparison template of being convenient to realize to mark by the fundamental frequency of the normal data that will obtain at every turn.
Pass through the embodiment of the invention, can preset degree value similar between test data and the labeled data, when test data is compared with normal data, when similarity reaches this prevalue, just the second fundamental frequency data of the corresponding normal data that obtains in this process are stored, as the mark template.
For the ease of understanding, overall plan is described in detail explanation below by concrete example to technical solution of the present invention.
Audio frequency with MP3 or MTV form is an example, and the original singer is the normal data described in the embodiment of the invention, and it comprises accompaniment and voice two parts usually, and what wherein voice comprised is the thematic information of song, is that the singing scoring is needed; Accompaniment then belongs to music, and theme is played the effect of annotation, but many times inconsistent with theme.This also is the subject matter of singing and marking based on the original singer at present.
Generally, the voice data of MTV form is the two-channel data, wherein, L channel is accompaniment, and R channel is original singer's (voice adds accompaniment), can adopt methods such as spectrum subtracts, echo elimination right data being carried out filtering, elimination part accompaniment music can improve the accuracy that fundamental frequency extracts like this.
When the user wishes the singing data of oneself are marked, at first should corresponding answer user's singing data to obtain the corresponding standard data, i.e. original singer's data.For original singer's data, can accurately mark out every section border that begins and finish in the data, utilization wherein contains the former data of not being with the original singer of joining in the chorus trains the GMM model respectively.
Obtain the normal data of corresponding test data, adopt and cut apart the GMM model, just can carry out the extraction of fundamental frequency to test data and normal data respectively after the segmentation of normal data process.
For the extraction of test data fundamental frequency, can adopt the fundamental frequency extraction algorithm of general time domain autocorrelation function, be not 0 value for fundamental frequency according to following formula:
y=12·log 2(x/440)+69 (2)
Convert pitch to, wherein: x is an incoming frequency, and unit is a hertz; Y is the output pitch, and unit is a semitone.
For the pitch of test data output, can regular in the following manner half frequency multiplication:
To arbitrary moment t, add up these moment each 50 frames of front and back average μ of totally 100 frame fundamental frequencies, if the starting position, at this moment the Frame less than 50 before the starting position is then carved the back and is got some fundamental frequency values more; End position in like manner.Then for t moment fundamental frequency value p t, get { p t-24, p t-12, p t, p t+ 12, p tAmong+the 24} and μ apart from minimum, think the actual value of this pitch constantly.Only calculate the average of fundamental frequency herein with each 50 frame of statistics front and back constantly, in the specific implementation, choose for the frame number of frame before and after adding up constantly, can set according to the practical application scene, the embodiment of the invention is not done concrete qualification to this.
In leaching process for the normal data fundamental frequency, according to presetting the capable frame that divides of frame length and frame shift-in, for example: to normal data is that frame length, 10ms are the capable frame that divides of frame shift-in with 25ms, wherein the sampled point that comprises of each frame data is x (n), (n=1,2 ... N), add up the fundamental frequency average of these frame data, the fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency that subtracts each other sampled point is afterwards carried out the Hanning window function handle, these frame data are added Hanning window w (n), (n=1,2 ... N), obtain x ' (n).
According to fundamental frequency computing formula (2) formula, calculate all coefficient of autocorrelation in each frame data sampled point, keeping coefficient of autocorrelation should maximum coefficient of autocorrelation time corresponding periodic quantity when maximum, and should be worth the benchmark of choosing as the fundamental frequency candidate value of described normal data the time cycle.
For the fundamental frequency value of being withed a hook at the end, if 1 frequency multiplication point of its correspondence is not therein, and corresponding fundamental frequency value is possible fundamental frequency value, and the half frequency multiplication value that then will meet this condition adds among the candidate in the lump, is converted into semitone according to formula (2).
When singing scoring, different people plays accent when singing and understands some difference, and needs to have eliminated the influence of transferring difference in the scoring process, and the definition Δ represents to rise between test data and the labeled data difference of accent.
After test data process staging treating, to establish the i section and comprise the T frame data, the pitch of T frame test data is respectively
Figure BDA0000030532160000081
The fundamental frequency candidate point that has of original singer's correspondence is
Figure BDA0000030532160000082
K wherein iBe i frame data fundamental frequency candidate point number }, then be defined as follows cost function:
dist = Σ t = 1 T min k ∈ [ 1 , k i ] ( | p i , t , k cand - p i , t s - Δ | ) / N - - - ( 3 )
Wherein, N is dist when calculating, and all original singer's fundamental frequency candidate points, test data fundamental frequency value all are not the frame number of 0 data.
For each Δ j=0.1j, { j ∈ Z ,-120≤j≤120} calculate the average error of i segment data respectively
Figure BDA0000030532160000084
And frame number N I, jIf i ≠ 1, then all data are transferred the difference Δ owing to rising before the i segment data jAnd the average error that produces is
Figure BDA0000030532160000091
Frame number is
Figure BDA0000030532160000092
Then the cost function in (3) formula should upgrade in the following manner:
dist i + 1 , j pre = ( dist i , j pre · N i , j pre + dist i , j sent · N i , j ) / ( N i , j pre + N i , j ) N i + 1 , j pre = N i , j pre + N i , j dist i , j sent = dist i + 1 , j pre - - - ( 4 )
Wherein,
Figure BDA0000030532160000094
A hour corresponding Δ is designated as
Figure BDA0000030532160000095
Simultaneously, obtaining final fundamental frequency is shown below:
p i , t = p i , t , k cand , k = arg ( min k ∈ [ 1 , k i ] ( | p i , t , k cand - p i , t s - Δ i best ) ) - - - ( 5 )
In the embodiment of the invention, final scoring process is to provide final scoring by the area of error between the fundamental frequency of fundamental frequency that calculates test data and normal data, promptly
dist = Σ t T i min ( | p i , t s - p i , t t - Δ i best | , MAXCOST ) / N i - - - ( 6 )
Wherein,
Figure BDA0000030532160000098
The fundamental frequency of the normal data that obtains is extracted in expression by fundamental frequency; Participate in scoring fundamental frequency value constantly for testing t of several i section; Promptly transfer the poor error that produces owing to rising between i section test data and the normal data, obtain based on many candidates fundamental frequency fetch strategy by the front; N all is not 0 frame number for fundamental frequency in all test datas and the normal data; MAXCOST represents the error upper limit, can preestablish.
On the basis of the extraction fundamental frequency of beginning determine to(for) each section test data and end position, every section beginning and end position are respectively Seg in the accurate data of bidding Start, Seg End, the beginning and the end position of the fundamental frequency of the test data correspondence that the tester provides are respectively P Start, P End, then the comparison score of i section test data is carried out regular according to following formula:
SentScore i = ( Σ t = 1 T i | p i , t s - p i , t t - Δ i best | + MAXCOST * K i ) / ( N i + K i ) - - - ( 7 )
In the following formula, K iRepresent to surpass 10% frame number, T in the part that each frame of every section test data begins not sing with end position iFor fundamental frequency in normal data in every section and the test data all is not 0 frame number.
In the actual mechanical process, after the fundamental frequency candidate point that obtains normal data, need carry out translation up and down to the fundamental frequency of test data correspondence, after translation and the error minimum between the fundamental frequency candidate point of normal data, promptly eliminate by rising between test data and the labeled data and transfer the error delta that produces, make for the every segment data in the test data, all corresponding to transferring standard together.Area of error between the fundamental frequency by calculating test data and the fundamental frequency of normal data provides final appraisal result.
In order further to improve the degree of accuracy of data contrast, can be weighted according to the physical length of normal data for the scoring of entire segment and to average, promptly
Tonescore = Σ i = 1 s ( LRC end - LRC start ) * SentScore i Σ i = 1 s ( LRC end - LRC start ) - - - ( 8 )
Then the appraisal result of final entire segment can obtain according to following formula:
FinalScore=a·Tonescore+b (9)
A wherein, b is respectively linear regression coeffficient.
In the foregoing description, can real-time storage the second fundamental frequency data of corresponding described normal data, thereby reach the purpose of renewal, optimizing criterion data fundamental curve.
In the embodiment of the invention, when test data is compared, can carry out choosing of normal data fundamental curve, way is as follows:
For every section in test data content, if the normal data fundamental curve of choosing according to system, the score that obtains test data exceeds certain predefined thresholding, the normal data fundamental curve Template that will utilize in the time of then will marking to this test data preserves, as the reference template of follow-up scoring.
For the new test data Test of portion Cur, if system has stored a same test data Test BestCorresponding normal data fundamental frequency template Template t, then for Test CurIn each section test data, if adopt appraisal result that above-mentioned many candidates fundamental frequency fetch strategy obtains not as Test BestThe middle corresponding appraisal result that obtains then adopts Test BestThe middle normal data fundamental frequency template of using is marked; Otherwise, then keep by current many candidates fundamental frequency and extract the appraisal result that the fundamental curve that obtains obtains as template, if score is higher than the thresholding of predesignating, then the fundamental curve that will currently obtain is stored as normal data fundamental frequency template.
The comparative approach embodiment of corresponding above-mentioned voice data, the present invention also provides a kind of comparison means of voice data, and as shown in Figure 4, described device comprises:
Normal data acquisition module 401 is used for obtaining gauss hybrid models GMM by the reference data that presets through accurate segmentation, training, utilizes described GMM that the normal data of test data correspondence is carried out segmentation;
The first fundamental frequency extraction module 402 is used to extract the first fundamental frequency data of corresponding described test data;
The second fundamental frequency extraction module 403, the normal data that is used for every section carries out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the initial error of test data described in every section and normal data;
Comparison module 404 is used for the described first fundamental frequency data and the second fundamental frequency data are compared, and draws comparative result.
The comparison means of voice data provided by the invention extracts relatively combining of regular and test data with the fundamental frequency of normal data, can accurately extract the fundamental frequency of normal data.
Need to prove that the above-mentioned second fundamental frequency extraction module specifically can comprise:
Sampled point fundamental frequency average is obtained submodule, is used for described test data obtaining the fundamental frequency average of all sampled points in every frame test data according to presetting the capable frame that divides of frame length and frame shift-in;
The fundamental frequency processing sub is used for the fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency of the sampled point after subtracting each other is carried out the processing of Hanning window function;
The fundamental frequency candidate point obtains submodule, be used for keeping the maximum coefficient of autocorrelation time corresponding of above-mentioned sampled point periodic quantity, with the choose benchmark of described time cycle value, and the fundamental frequency candidate point data of described normal data are converted to the pitch data as the fundamental frequency candidate value of described normal data;
Initial error is obtained submodule, the initial error of test data and normal data described in being used to obtain every section;
Cost function is determined submodule, is used for determining area of error according to described initial error and fundamental frequency candidate point data;
Second fundamental frequency is determined submodule, be used for obtaining the minimum value of the corresponding every section test data of described area of error, and obtain corresponding initial error according to the minimum value in the average error of every section test data of correspondence, determine the second fundamental frequency data of corresponding described normal data.
In the comparison means embodiment of another voice data of the present invention, as shown in Figure 5, described device can also comprise:
Memory module is used for storing the described second fundamental frequency data, as standard form when the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue.
Fundamental frequency by the normal data that will obtain is at every turn stored, the Automatic Optimal of the comparison template of being convenient to realize to mark.
In the specific implementation, described comparison module can draw comparative result by the area of error between the described first fundamental frequency data and the second fundamental frequency data.
In order further to improve the degree of accuracy of data contrast, the mode that described comparison module is also averaged by weighting by the comparative result of every segment data that will obtain according to described area of error obtains the comparative result between whole test data and the normal data.
For device embodiment, because it is substantially corresponding to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.Device embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, promptly can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select wherein some or all of module to realize the purpose of present embodiment scheme according to the actual needs.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.
One of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-OnlyMemory, ROM) or at random store memory body (Random Access Memory, RAM) etc.
To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be conspicuous concerning those skilled in the art, and defined herein General Principle can realize under the situation of the spirit or scope that do not break away from the embodiment of the invention in other embodiments.Therefore, the embodiment of the invention will can not be restricted to these embodiment shown in this article, but will meet and principle disclosed herein and features of novelty the wideest corresponding to scope.

Claims (11)

1. the comparative approach of a voice data is characterized in that, described method comprises:
Obtain gauss hybrid models GMM by the reference data that presets through accurate segmentation, training, utilize described GMM that the normal data of test data correspondence is carried out segmentation;
Extract the first fundamental frequency data of corresponding described test data;
Normal data in every section is carried out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the error of test data described in every section and normal data;
The described first fundamental frequency data and the second fundamental frequency data are compared, draw comparative result.
2. the comparative approach of voice data according to claim 1 is characterized in that, described normal data in every section is carried out the branch frame, extracts the fundamental frequency candidate point data of corresponding described normal data, comprising:
Described test data according to presetting the capable frame that divides of frame length and frame shift-in, is obtained the fundamental frequency average of all sampled points in every frame test data;
The fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency of the sampled point after subtracting each other is carried out the processing of Hanning window function;
Keep maximum coefficient of autocorrelation time corresponding periodic quantity in the above-mentioned sampled point,, and the fundamental frequency candidate point data of described normal data are converted to the pitch data the benchmark of choosing of described time cycle value as the fundamental frequency candidate value of described normal data.
3. the comparative approach of voice data according to claim 1 is characterized in that, described initial error in conjunction with described test data and normal data obtains the second fundamental frequency data of corresponding described normal data, comprising:
The initial error of test data described in obtaining every section and normal data;
According to described initial error and fundamental frequency candidate point data, determine area of error;
Obtain the minimum value of corresponding every section test data in the described area of error, and obtain corresponding initial error, determine the second fundamental frequency data of corresponding described normal data according to the minimum value in the average error of every section test data of correspondence.
4. the comparative approach of voice data according to claim 4 is characterized in that, described method also comprises:
When the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue, store the described second fundamental frequency data, as standard form.
5. the comparative approach of voice data according to claim 1 is characterized in that, described the described first fundamental frequency data and the second fundamental frequency data is compared, and draws comparative result, comprising:
Area of error by between the described first fundamental frequency data and the second fundamental frequency data draws comparative result.
6. the comparative approach of voice data according to claim 5 is characterized in that, described method also comprises:
The mode that the comparative result of every segment data that will obtain according to described area of error is averaged by weighting obtains the comparative result between whole test data and the normal data.
7. the comparison means of a voice data is characterized in that, described device comprises:
The normal data acquisition module is used for obtaining gauss hybrid models GMM by the reference data that presets through accurate segmentation, training, utilizes described GMM that the normal data of test data correspondence is carried out segmentation;
The first fundamental frequency extraction module is used to extract the first fundamental frequency data of corresponding described test data;
The second fundamental frequency extraction module, the normal data that is used for every section carries out the branch frame, extract the fundamental frequency candidate point data of corresponding described normal data,, obtain the second fundamental frequency data of corresponding described normal data in conjunction with the initial error of test data described in every section and normal data;
Comparison module is used for the described first fundamental frequency data and the second fundamental frequency data are compared, and draws comparative result.
8. the comparison means of voice data according to claim 7 is characterized in that, the described second fundamental frequency extraction module specifically comprises:
Sampled point fundamental frequency average is obtained submodule, is used for described test data obtaining the fundamental frequency average of all sampled points in every frame test data according to presetting the capable frame that divides of frame length and frame shift-in;
The fundamental frequency processing sub is used for the fundamental frequency value of described sampled point is deducted described fundamental frequency average, and the fundamental frequency of the sampled point after subtracting each other is carried out the processing of Hanning window function;
The fundamental frequency candidate point obtains submodule, be used for keeping the maximum coefficient of autocorrelation time corresponding of above-mentioned sampled point periodic quantity, with the choose benchmark of described time cycle value, and the fundamental frequency candidate point data of described normal data are converted to the pitch data as the fundamental frequency candidate value of described normal data;
Initial error is obtained submodule, the initial error of test data and normal data described in being used to obtain every section;
Cost function is determined submodule, is used for determining cost function according to described initial error and fundamental frequency candidate point data;
Second fundamental frequency is determined submodule, be used for obtaining the minimum value in the average error of the corresponding every section test data of described cost function, and obtain corresponding initial error according to the minimum value in the average error of every section test data of correspondence, determine the second fundamental frequency data of corresponding described normal data.
9. the comparison means of voice data according to claim 7 is characterized in that, described device also comprises:
Memory module is used for storing the described second fundamental frequency data, as standard form when the comparative result of the described second fundamental frequency data and the first fundamental frequency data satisfies prevalue.
10. the comparison means of voice data according to claim 7 is characterized in that, described comparison module draws comparative result by the area of error between the described first fundamental frequency data and the second fundamental frequency data.
11. the comparison means of voice data according to claim 10, it is characterized in that, the mode that described comparison module is also averaged by weighting by the comparative result of every segment data that will obtain according to described area of error obtains the comparative result between whole test data and the normal data.
CN201010530213A 2010-11-02 2010-11-02 Method and device for comparing audio data Active CN101968958B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010530213A CN101968958B (en) 2010-11-02 2010-11-02 Method and device for comparing audio data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010530213A CN101968958B (en) 2010-11-02 2010-11-02 Method and device for comparing audio data

Publications (2)

Publication Number Publication Date
CN101968958A true CN101968958A (en) 2011-02-09
CN101968958B CN101968958B (en) 2012-09-26

Family

ID=43548102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010530213A Active CN101968958B (en) 2010-11-02 2010-11-02 Method and device for comparing audio data

Country Status (1)

Country Link
CN (1) CN101968958B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294567A (en) * 2016-07-26 2017-01-04 腾讯科技(深圳)有限公司 A kind of Audio Sorting method and apparatus
CN106448630A (en) * 2016-09-09 2017-02-22 腾讯科技(深圳)有限公司 Method and device for generating digital music file of song
CN108172206A (en) * 2017-12-27 2018-06-15 广州酷狗计算机科技有限公司 Audio-frequency processing method, apparatus and system
CN108257609A (en) * 2017-12-05 2018-07-06 北京小唱科技有限公司 The modified method of audio content and its intelligent apparatus
CN110210317A (en) * 2019-05-07 2019-09-06 平安科技(深圳)有限公司 Detect the method, apparatus and computer readable storage medium of fundamental frequency
CN111429949A (en) * 2020-04-16 2020-07-17 广州繁星互娱信息科技有限公司 Pitch line generation method, device, equipment and storage medium
CN113763930A (en) * 2021-11-05 2021-12-07 深圳市倍轻松科技股份有限公司 Voice analysis method, device, electronic equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1106946A (en) * 1993-11-09 1995-08-16 大宇电子株式会社 Karaoke system capable of scoring a singing of a singer on accompaniment thereof
US6836761B1 (en) * 1999-10-21 2004-12-28 Yamaha Corporation Voice converter for assimilation by frame synthesis with temporal alignment
US20060272488A1 (en) * 2005-05-26 2006-12-07 Yamaha Corporation Sound signal processing apparatus, sound signal processing method, and sound signal processing program
CN101364407A (en) * 2008-09-17 2009-02-11 清华大学 Karaoke singing marking method keeping subjective consistency
CN101859560A (en) * 2009-04-07 2010-10-13 林文信 Automatic marking method for karaok vocal accompaniment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1106946A (en) * 1993-11-09 1995-08-16 大宇电子株式会社 Karaoke system capable of scoring a singing of a singer on accompaniment thereof
US6836761B1 (en) * 1999-10-21 2004-12-28 Yamaha Corporation Voice converter for assimilation by frame synthesis with temporal alignment
US20060272488A1 (en) * 2005-05-26 2006-12-07 Yamaha Corporation Sound signal processing apparatus, sound signal processing method, and sound signal processing program
CN101364407A (en) * 2008-09-17 2009-02-11 清华大学 Karaoke singing marking method keeping subjective consistency
CN101859560A (en) * 2009-04-07 2010-10-13 林文信 Automatic marking method for karaok vocal accompaniment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《Visual Communications and Image Processing 2005》 20051231 Arun Shenoy et al Singing Voice Detection for Karaoke Application , 2 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294567A (en) * 2016-07-26 2017-01-04 腾讯科技(深圳)有限公司 A kind of Audio Sorting method and apparatus
CN106448630A (en) * 2016-09-09 2017-02-22 腾讯科技(深圳)有限公司 Method and device for generating digital music file of song
CN106448630B (en) * 2016-09-09 2020-08-04 腾讯科技(深圳)有限公司 Method and device for generating digital music score file of song
US10923089B2 (en) 2016-09-09 2021-02-16 Tencent Technology (Shenzhen) Company Limited Method and apparatus for generating digital score file of song, and storage medium
CN108257609A (en) * 2017-12-05 2018-07-06 北京小唱科技有限公司 The modified method of audio content and its intelligent apparatus
CN108172206A (en) * 2017-12-27 2018-06-15 广州酷狗计算机科技有限公司 Audio-frequency processing method, apparatus and system
CN110210317A (en) * 2019-05-07 2019-09-06 平安科技(深圳)有限公司 Detect the method, apparatus and computer readable storage medium of fundamental frequency
CN110210317B (en) * 2019-05-07 2024-04-09 平安科技(深圳)有限公司 Method, apparatus and computer readable storage medium for detecting fundamental frequency
CN111429949A (en) * 2020-04-16 2020-07-17 广州繁星互娱信息科技有限公司 Pitch line generation method, device, equipment and storage medium
CN111429949B (en) * 2020-04-16 2023-10-13 广州繁星互娱信息科技有限公司 Pitch line generation method, device, equipment and storage medium
CN113763930A (en) * 2021-11-05 2021-12-07 深圳市倍轻松科技股份有限公司 Voice analysis method, device, electronic equipment and computer readable storage medium
CN113763930B (en) * 2021-11-05 2022-03-11 深圳市倍轻松科技股份有限公司 Voice analysis method, device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN101968958B (en) 2012-09-26

Similar Documents

Publication Publication Date Title
Rao et al. Vocal melody extraction in the presence of pitched accompaniment in polyphonic music
CN105632484B (en) Speech database for speech synthesis pause information automatic marking method and system
CN101968958A (en) Method and device for comparing audio data
Gómez et al. Towards computer-assisted flamenco transcription: An experimental comparison of automatic transcription algorithms as applied to a cappella singing
Mauch et al. Timbre and Melody Features for the Recognition of Vocal Activity and Instrumental Solos in Polyphonic Music.
US20100192753A1 (en) Karaoke apparatus
CN106971703A (en) A kind of song synthetic method and device based on HMM
CN103714806B (en) A kind of combination SVM and the chord recognition methods of in-dash computer P feature
Molina et al. SiPTH: Singing transcription based on hysteresis defined on the pitch-time curve
WO2013133768A1 (en) Method and system for template-based personalized singing synthesis
Lagrange et al. Normalized cuts for predominant melodic source separation
CN102903357A (en) Method, device and system for extracting chorus of song
CN104766603A (en) Method and device for building personalized singing style spectrum synthesis model
CN103915093B (en) A kind of method and apparatus for realizing singing of voice
CN101290766A (en) Syllable splitting method of Tibetan language of Anduo
CN107978322A (en) A kind of K songs marking algorithm
Dannenberg Listening to “Naima”: An automated structural analysis of music from recorded audio
Pant et al. A melody detection user interface for polyphonic music
Maestre et al. Automatic characterization of dynamics and articulation of expressive monophonic recordings
Lerch Software-based extraction of objective parameters from music performances
CN105895079A (en) Voice data processing method and device
Gurunath Reddy et al. Predominant melody extraction from vocal polyphonic music signal by time-domain adaptive filtering-based method
CN103440250A (en) Embedded humming retrieval method and system based on 16-bit DSP (Digital Signal Processing) platform application
Vincent et al. Predominant-F0 estimation using Bayesian harmonic waveform models
CN104766602A (en) Fundamental synthesis parameter generation method and system in singing synthesis system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: IFLYTEK CO., LTD.

Free format text: FORMER NAME: ANHUI USTC IFLYTEK CO., LTD.

CP03 Change of name, title or address

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: IFLYTEK Co.,Ltd.

Address before: 230088 No. 616, Mount Huangshan Road, hi tech Development Zone, Anhui, Hefei

Patentee before: ANHUI USTC IFLYTEK Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20210414

Address after: 200335 room 1850, 1st floor, building 8, 33 Guangshun Road, Changning District, Shanghai

Patentee after: SHANGHAI XUNFEI RUIYUAN INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 230088 666 Wangjiang West Road, Hefei hi tech Development Zone, Anhui

Patentee before: IFLYTEK Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231220

Address after: 200335 room 1966, 1st floor, building 8, 33 Guangshun Road, Changning District, Shanghai

Patentee after: IFLYTEK (Shanghai) Technology Co.,Ltd.

Address before: 200335 room 1850, 1st floor, building 8, 33 Guangshun Road, Changning District, Shanghai

Patentee before: SHANGHAI XUNFEI RUIYUAN INFORMATION TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right