CN104700842B - The delay time estimation method and device of voice signal - Google Patents

The delay time estimation method and device of voice signal Download PDF

Info

Publication number
CN104700842B
CN104700842B CN201510083890.1A CN201510083890A CN104700842B CN 104700842 B CN104700842 B CN 104700842B CN 201510083890 A CN201510083890 A CN 201510083890A CN 104700842 B CN104700842 B CN 104700842B
Authority
CN
China
Prior art keywords
voice signal
mrow
way
matching
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510083890.1A
Other languages
Chinese (zh)
Other versions
CN104700842A (en
Inventor
陈超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Baiguoyuan Information Technology Co Ltd
Original Assignee
Guangzhou Baiguoyuan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Baiguoyuan Information Technology Co Ltd filed Critical Guangzhou Baiguoyuan Information Technology Co Ltd
Priority to CN201510083890.1A priority Critical patent/CN104700842B/en
Publication of CN104700842A publication Critical patent/CN104700842A/en
Application granted granted Critical
Publication of CN104700842B publication Critical patent/CN104700842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses the delay time estimation method and device of a kind of voice signal, belong to audio signal processing technique field.The described method includes:Obtain two-way voice signal;Coherence is carried out according to the Short Time Fourier Transform of two-way voice signal to two-way voice signal to match to obtain the first matching result, the first matching result includes the first matched position and the first matching degree of two-way voice signal;Coherence is carried out according to the Spectral correlation of the power spectrum of two-way voice signal to two-way voice signal to match to obtain the second matching result, the second matching result includes the second matched position and the second matching degree of two-way voice signal;The time delay between two-way voice signal is calculated according to the first matching result and the second matching result.Solve the problems, such as that accuracy is low existing for Time Delay Estimation Based;Voice signal is matched from two angles of Spectral correlation of frequency domain distribution and power spectrum, comprehensive two matching results determine final matching results, improve matching precision and time delay accuracy of estimation.

Description

The delay time estimation method and device of voice signal
Technical field
The present invention relates to audio signal processing technique field, the delay time estimation method and device of more particularly to a kind of voice signal.
Background technology
The Time Delay Estimation Algorithms of voice signal are all obtained in numerous areas such as Sound Match, encoding and decoding alignment, sound rangings Extensive use.
The prior art also provides a variety of different delay time estimation methods, the widely used time delay estimation of one of which Algorithm is the delay time estimation method based on correlation analysis.The basic thought of this method is:Using two voice signals on frequency domain Similarity degree estimate time delay between the two voice signals.
In the implementation of the present invention, inventor has found that above-mentioned technology has at least the following problems:It is above-mentioned to be based on correlation The delay time estimation method of analysis only accounts for similarity degree of two voice signals on frequency domain so that of two voice signals It is relatively low with precision, cause the time delay accuracy that is finally calculated relatively low.
The content of the invention
In order to solve the problems, such as that accuracy existing for delay time estimation method that above-mentioned technology is related to is low, the embodiment of the present invention carries The delay time estimation method and device of a kind of voice signal are supplied.The technical solution is as follows:
First aspect, there is provided a kind of delay time estimation method of voice signal, the described method includes:
Obtain two-way voice signal;
Coherence's matching is carried out to the two-way voice signal according to the Short Time Fourier Transform of the two-way voice signal The first matching result is obtained, first matching result includes the first matched position and the first matching of the two-way voice signal Degree;
Coherence is carried out to the two-way voice signal according to the Spectral correlation of the power spectrum of the two-way voice signal Matching obtains the second matching result, and second matching result includes the second matched position and second of the two-way voice signal Matching degree;
The time delay between the two-way voice signal is calculated according to first matching result and second matching result.
Optionally, it is described that the two-way voice signal is calculated according to first matching result and second matching result Between time delay, including:
For per voice signal all the way, being calculated using weighted average first matched position and second matched position Method calculates final matched position, and the weight of the Weighted Average Algorithm is according to first matching degree and second matching degree Definite;
The time delay between the two-way voice signal is calculated according to the final matched position of the two-way voice signal.
Optionally, the Short Time Fourier Transform according to the two-way voice signal carries out the two-way voice signal Coherence matches to obtain the first matching result, including:
For per voice signal all the way, making an uproar according to the following equation to each frame voice signal in the voice signal Acoustic tracking, obtains the noise spectrum N (w, n) of each frame voice signal:
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu < 1;W represents the frequency point sequence number on frequency domain;N represents the frame number in time domain;
The Short Time Fourier Transform progress binary conversion treatment to each frame voice signal obtains two-value spectrum according to the following equation Xb(w,n):
TbFor preset first threshold value;
By the wherein all the way corresponding K of voice signalaA two-value composes K corresponding with another way voice signalbA two-value spectrum carries out Coherence between two-by-two matches to obtain first matching result, and first matching result includes highest one group two of matching degree Value composes corresponding matched position and matching degree, Ka、KbIt is positive integer.
Optionally, the Spectral correlation of the power spectrum according to the two-way voice signal is to the two-way voice signal Coherence is carried out to match to obtain the second matching result, including:
For per voice signal all the way, calculating the work(of each frame voice signal in the voice signal according to the following equation Rate spectrum P (w, n):
P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αpFor predetermined coefficient and 0 < αp< 1;W tables Show the frequency point sequence number on frequency domain;N represents the frame number in time domain;
The Spectral correlation DP (w, n) of the power spectrum of each frame voice signal is calculated according to the following equation:
DP (w, n)=| P (w+1, n)-P (w, n) |;
Noise tracking is carried out to the Spectral correlation DP (w, n) according to the following equation, obtains each frame voice signal The Spectral correlation NDP (w, n) of noise power spectrum:
Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1;
The Spectral correlation DP (w, n) the progress binary conversion treatment of each frame voice signal is obtained according to the following equation Correlation two-value spectrum XDb (w, n):
TDbTo preset second threshold;
By the wherein all the way corresponding KD of voice signalaA correlation two-value composes KD corresponding with another way voice signalbA phase Coherence between closing property two-value spectrum carries out two-by-two matches to obtain second matching result, and second matching result includes matching Spend highest one group of correlation two-value and compose corresponding matched position and matching degree, KDa、KDbIt is positive integer.
Optionally, the Short Time Fourier Transform according to the two-way voice signal carries out the two-way voice signal Before coherence matches to obtain the first matching result, further include:
For per voice signal all the way, being pre-processed to obtain pretreated voice signal, institute to the voice signal Stating pretreatment includes at least one of noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing;
Short Time Fourier Transform is carried out to the pretreated voice signal.
Second aspect, there is provided a kind of time delay estimation device of voice signal, described device include:
Signal acquisition module, for obtaining two-way voice signal;
First matching module, believes the two-way sound for the Short Time Fourier Transform according to the two-way voice signal Number carry out coherence match to obtain the first matching result, first matching result includes first of the two-way voice signal With position and the first matching degree;
Second matching module, the Spectral correlation for the power spectrum according to the two-way voice signal is to the two-way sound Sound signal carries out coherence and matches to obtain the second matching result, and second matching result includes the of the two-way voice signal Two matched positions and the second matching degree;
Time-delay calculation module, for calculating the two-way sound according to first matching result and second matching result Time delay between sound signal.
Optionally, the time-delay calculation module, including:Position calculation unit and time-delay calculation unit;
The position calculation unit, for for per voice signal all the way, to first matched position and described second Matched position calculates final matched position using Weighted Average Algorithm, and the weight of the Weighted Average Algorithm is according to described first What matching degree and second matching degree determined;
The time-delay calculation unit, described two are calculated for the final matched position according to the two-way voice signal Time delay between the voice signal of road.
Optionally, first matching module, including:First tracking cell, the first binarization unit and the first matching are single Member;
First tracking cell, for for per voice signal all the way, according to the following equation in the voice signal Each frame voice signal carry out noise tracking, obtain the noise spectrum N (w, n) of each frame voice signal:
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu < 1;W represents the frequency point sequence number on frequency domain;N represents the frame number in time domain;
First binarization unit, for according to the following equation to the Short Time Fourier Transform of each frame voice signal into Row binary conversion treatment obtains two-value spectrum Xb (w, n):
TbFor preset first threshold value;
First matching unit, for will the wherein all the way corresponding K of voice signalaA two-value spectrum is believed with another way sound Number corresponding KbCoherence between a two-value spectrum carries out two-by-two matches to obtain first matching result, first matching result Corresponding matched position and matching degree, K are composed including the highest one group of two-value of matching degreea、KbIt is positive integer.
Optionally, second matching module, including:Spectra calculation unit, correlation calculations unit, the second tracking are single Member, the second binarization unit and the second matching unit;
The spectra calculation unit, for for per voice signal all the way, calculating the sound letter according to the following equation The power spectrum P (w, n) of each frame voice signal in number:
P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αpFor predetermined coefficient and 0 < αp< 1;W tables Show the frequency point sequence number on frequency domain;N represents the frame number in time domain;
The correlation calculations unit, for phase between the spectrum for the power spectrum for calculating each frame voice signal according to the following equation Closing property DP (w, n):
DP (w, n)=| P (w+1, n)-P (w, n) |;
Second tracking cell, for carrying out noise tracking to the Spectral correlation DP (w, n) according to the following equation, Obtain the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1;
Second binarization unit, for according to the following equation to the Spectral correlation DP of each frame voice signal (w, n) carries out binary conversion treatment and obtains correlation two-value spectrum XDb (w, n):
TDbTo preset second threshold;
Second matching unit, for will the wherein all the way corresponding KD of voice signalaA correlation two-value spectrum with it is another The corresponding KD of road voice signalbCoherence between a correlation two-value spectrum carries out two-by-two matches to obtain second matching result, Second matching result includes the highest one group of correlation two-value of matching degree and composes corresponding matched position and matching degree, KDa、KDb It is positive integer.
Optionally, described device further includes:
Signal pre-processing module, for for per voice signal all the way, being pre-processed to obtain to the voice signal pre- Voice signal after processing, the pretreatment are included in noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing At least one;
Fourier transformation module, for carrying out Short Time Fourier Transform to the pretreated voice signal.
The beneficial effect that technical solution provided in an embodiment of the present invention is brought is:
The first matching result is obtained by carrying out analysis matching to the Short Time Fourier Transform of two-way voice signal, and is passed through Analysis matching is carried out to the Spectral correlation of the power spectrum of two-way voice signal and obtains the second matching result, then in conjunction with first The time delay between two-way voice signal is calculated with result and the second matching result;Solves the delay time estimation method that correlation technique is related to The problem of existing accuracy is low;From frequency domain distribution and two angles of Spectral correlation of power spectrum respectively to two-way voice signal The matching analysis is carried out, and comprehensive two matching results determine final matching results, have reached raising matching precision, improve time delay The effect of accuracy of estimation.
Brief description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other can also be obtained according to these attached drawings Attached drawing.
Fig. 1 is the method flow diagram of the delay time estimation method of voice signal provided by one embodiment of the present invention;
Fig. 2 is the method flow diagram of the delay time estimation method for the voice signal that another embodiment of the present invention provides;
Fig. 3 is the block diagram of the time delay estimation device of voice signal provided by one embodiment of the present invention;
Fig. 4 is the block diagram of the time delay estimation device for the voice signal that another embodiment of the present invention provides;
Fig. 5 is the structure diagram of electronic equipment provided by one embodiment of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Please refer to Fig.1, it illustrates the method for the delay time estimation method of voice signal provided by one embodiment of the present invention Flow chart, the present embodiment by the delay time estimation method be applied to mobile phone, tablet computer, laptops or it is desk-top in terms of It is illustrated in the electronic equipment of calculation machine etc.The delay time estimation method can include the following steps:
Step 102, two-way voice signal is obtained.
Step 104, coherence's matching is carried out to two-way voice signal according to the Short Time Fourier Transform of two-way voice signal The first matching result is obtained, which includes the first matched position and the first matching degree of two-way voice signal.
Step 106, coherence is carried out to two-way voice signal according to the Spectral correlation of the power spectrum of two-way voice signal Matching obtains the second matching result, which includes the second matched position and the second matching of two-way voice signal Degree.
Step 108, the time delay between two-way voice signal is calculated according to the first matching result and the second matching result.
It should be noted is that:Above-mentioned steps 106 can carry out after step 104, can also be before step 104 Carry out, or be carried out at the same time with step 104, in the present embodiment, is only illustrated after step 104 with step 106, This is not especially limited.
In conclusion the delay time estimation method of voice signal provided in this embodiment, by the short of two-way voice signal When Fourier transformation carry out analysis matching obtain the first matching result, and phase the spectrum for passing through the power spectrum two-way voice signal Closing property carries out analysis matching and obtains the second matching result, and two-way sound is calculated then in conjunction with the first matching result and the second matching result Time delay between sound signal;Solve the problems, such as that accuracy existing for the delay time estimation method that correlation technique is related to is low;From frequency domain point Two angles of Spectral correlation of cloth and power spectrum respectively carry out two-way voice signal the matching analysis, and comprehensive two matchings knot Fruit determines final matching results, has reached raising matching precision, improves the effect of time delay accuracy of estimation.
Please refer to Fig.2, the method for the delay time estimation method of the voice signal provided it illustrates another embodiment of the present invention Flow chart, the present embodiment by the delay time estimation method be applied to mobile phone, tablet computer, laptops or it is desk-top in terms of It is illustrated in the electronic equipment of calculation machine etc.The delay time estimation method can include the following steps:
Step 201, two-way voice signal is obtained.
Two-way voice signal is the discrete signal in time domain.In this example, it is assumed that wherein all the way voice signal A For xra(n), another way voice signal B is xrb(n), n represents the frame number in time domain, n ∈ [0, M-1], M >=2 and n, M are Integer.
Step 202, for per voice signal all the way, being pre-processed to obtain pretreated sound letter to voice signal Number.
Wherein, pretreatment includes but not limited in noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing At least one.The purpose pre-processed to voice signal is to extract more accurate, reliable sound in the follow-up process Sound feature, to improve matching precision.
In this example, it is assumed that to being pretreatment F (*) per voice signal all the way, then wherein voice signal A's all the way is pre- Handling result is xa(n)=F (xra(n)), the pre-processed results of another way voice signal B are xb(n)=F (xrb(n))。
It should be noted is that:The present embodiment is only illustrated with the above-mentioned several pretreatment modes enumerated, In practical application, other pretreatment modes can be used according to the actual requirements, this present embodiment is not especially limited.
Step 203, Short Time Fourier Transform is carried out to pretreated voice signal.
The process of STFT (Short Time Fourier Transform, Short Time Fourier Transform) is to use predetermined window letter It is several that and FFT (Fast Fourier Transform, Fast Fourier Transform (FFT)) is done to pretreated voice signal adding window, from And voice signal is transformed into frequency domain from time domain.Wherein, the selection of window function includes but not limited to hamming window and Caesar's window.Window The length of function is 2 power, such as 128,256 etc..
In the present embodiment, remember that STFT functions are STFT (*), then wherein the STFT of voice signal A can be denoted as X all the waya(w, N)=STFT (xa(n)), the STFT of another way voice signal B can be denoted as Xb(w, n)=STFT (xb(n)), w is represented on frequency domain Frequency point sequence number, w ∈ [0, N-1], N >=2 and w, N are integer.
Step 204, coherence's matching is carried out to two-way voice signal according to the Short Time Fourier Transform of two-way voice signal The first matching result is obtained, which includes the first matched position and the first matching degree of two-way voice signal.
In the first matching way, two-way voice signal is matched from the frequency domain distribution angle of voice signal.Tool For body, this step can include following several sub-steps:
First, for per voice signal all the way, being carried out according to the following equation to each frame voice signal in voice signal Noise tracks, and obtains the noise spectrum N (w, n) of each frame voice signal:
Wherein, X (w, n) represents the Short Time Fourier Transform of voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu< 1; W represents the frequency point sequence number on frequency domain;N represents the frame number in time domain.
The purpose that noise tracking is carried out to voice signal is to reduce the noise in voice signal as much as possible to matching As a result influence, to improve matching precision.In addition, pass through reasonable set predetermined coefficient αu、αdSize, in signal uphill process It is middle to use smaller factor alphau, and use larger factor alpha during signal declined, noise tracking effect can be improved.
In this example, it is assumed that the noise spectrum of the i-th frame voice signal in wherein all the way voice signal A is Na(w, i), Then:
Wherein, Xa(w, i) represents the Short Time Fourier Transform of the i-th frame voice signal in voice signal A, i >=0 and i is Integer.
Similar, it is assumed that the noise spectrum of the jth frame voice signal in another way voice signal B is Nb(w, j), then:
Wherein, Xb(w, j) represents the Short Time Fourier Transform of the jth frame voice signal in voice signal B, j >=0 and j is Integer.
Second, binary conversion treatment is carried out to the Short Time Fourier Transform of each frame voice signal according to the following equation and obtains two Value spectrum Xb (w, n):
TbFor preset first threshold value.
After binary conversion treatment is carried out to the frequency spectrum of voice signal, the Short Time Fourier Transform of each frame voice signal Binary sequence being converted into equal length, being made of 0 and 1.Data processing can fully be improved by binary conversion treatment Speed, subsequently to carry out the matching primitives of highly efficient (robust).
In this example, it is assumed that the corresponding two-value spectrum of the i-th frame voice signal in wherein all the way voice signal A is Xba (w, i), then:
Similar, it is assumed that the corresponding two-value spectrum of jth frame voice signal in another way voice signal B is Xbb(w, j), then:
3rd, will the wherein all the way corresponding K of voice signalaA two-value composes K corresponding with another way voice signalbA two-value Coherence between spectrum carries out two-by-two matches to obtain the first matching result, Ka、KbIt is positive integer.
Wherein, the first matching result includes the corresponding matched position of the highest one group of two-value spectrum of matching degree and matching degree.When The corresponding two-value spectrum quantity of wherein all the way voice signal A is KaAnd the corresponding two-value spectrum quantity of another way voice signal B is KbWhen, Need to carry out Ka×KbSecondary two-value spectrum matching.For matching each time, the matching degree between two two-value spectrums is recorded, and carry out The sequence number n of matched two two-values spectruma(i) and nb(j).Wherein, na(i) the i-th frame sound in wherein all the way voice signal A is represented The sequence number of the corresponding two-value spectrum of sound signal, na(i)∈[0,Ka- 1], Ka>=2 and na(i)、KaIt is integer;nb(j) represent another The sequence number of the corresponding two-value spectrum of jth frame voice signal in the voice signal B of road, nb(j)∈[0,Kb- 1], Kb>=2 and nb(j)、Kb It is integer.
For example if the corresponding two-value of the i-th frame voice signal in wherein all the way voice signal A is composed into XbaIt is (w, i) and another The corresponding two-value spectrum Xb of jth frame voice signal all the way in voice signal Bb(w, j) is matched, then both matching degree Pbij For:
Wherein, ⊙ represents same or oeprator.Matching degree PbijCorresponding position binaryzation result in equal to two two-value spectrums Equal frequency point is to quantity and frequency point to total ratio.
In a specific example, it is assumed that Xba(w, i)={ 1,1,0,0,0,1,1,1 }, Xbb(w, j)=0,1,1,1, 0,1,1,1 }, then as w=0, due to Xba(0, i)=1, Xbb(0, j)=0, so Xba(0,i)⊙Xbb(0, j)=0;Work as w When=1, due to Xba(1, i)=1, Xbb(1, j)=1, so Xba(1,i)⊙Xbb(1, j)=1;And so on, it can calculate The matching degree Pb of above-mentioned two binary sequenceij=5/8=0.625.
After the matching degree being calculated between each two two-value spectrum, the highest one group of two-value spectrum of matching degree is chosen, and remember Record this group of two-value and compose corresponding sequence number and matching degree.
In this example, it is assumed that the i-th frame voice signal corresponding n-th in wherein all the way voice signal Aa(i) a two Value spectrum and the jth frame voice signal corresponding n-th in another way voice signal Bb(j) the matching degree highest of a two-value spectrum, is denoted as P1.Such as i=1 and j=2,
Step 205, coherence is carried out to two-way voice signal according to the Spectral correlation of the power spectrum of two-way voice signal Matching obtains the second matching result, which includes the second matched position and the second matching of two-way voice signal Degree.
In second of matching way, two-way voice signal is matched from the Spectral correlation angle of power spectrum.Tool For body, this step can include following several sub-steps:
First, for per voice signal all the way, calculating each frame voice signal in voice signal according to the following equation Power spectrum P (w, n):
P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2
Wherein, X (w, n) represents the Short Time Fourier Transform of voice signal;αpFor predetermined coefficient and 0 < αp< 1;W represents frequency Frequency point sequence number on domain;N represents the frame number in time domain.
In this example, it is assumed that the power spectrum of the i-th frame voice signal in wherein all the way voice signal A is Pa(w, i), Then:
Pa(w, i)=αpPa(w,i-1)+(1-αp)|Xa(w,i)|2
Wherein, Xa(w, i) represents the Short Time Fourier Transform of the i-th frame voice signal in voice signal A, i >=0 and i is Integer.
Similar, it is assumed that the power spectrum of the jth frame voice signal in another way voice signal B is Pb(w, j), then:
Pb(w, j)=αpPb(w,j-1)+(1-αp)|Xb(w,j)|2
Wherein, Xb(w, j) represents the Short Time Fourier Transform of the jth frame voice signal in voice signal B, j >=0 and j is Integer.
Second, the Spectral correlation DP (w, n) of the power spectrum of each frame voice signal is calculated according to the following equation:
DP (w, n)=| P (w+1, n)-P (w, n) |.
After the power spectrum of voice signal is calculated, the energy of high frequency points is subtracted to the energy of low frequency point, calculates power The Spectral correlation of spectrum.
It is in this example, it is assumed that related between the spectrum of the power spectrum of the i-th frame voice signal in wherein all the way voice signal A Property is DPa(w, i), then:
DPa(w, i)=| Pa(w+1,i)-Pa(w,i)|;
Similar, it is assumed that the Spectral correlation of the power spectrum of the jth frame voice signal in another way voice signal B is DPb (w, j), then:
DPb(w, j)=| Pb(w+1,j)-Pb(w,j)|。
3rd, noise tracking is carried out to Spectral correlation DP (w, n) according to the following equation, obtains each frame voice signal The Spectral correlation NDP (w, n) of noise power spectrum:
Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1.
The purpose for carrying out noise tracking to Spectral correlation is that the noise fluctuated in order to prevent brings subsequent match erroneous judgement, Influence of the noise to matching result is reduced as much as possible, to improve matching precision.In addition, pass through reasonable set predetermined coefficient βu、βd Size, noise tracking effect can be improved.
In this example, it is assumed that between the spectrum of the noise power spectrum of the i-th frame voice signal in wherein all the way voice signal A Correlation is NDPa(w, i), then:
It is similar, it is assumed that the Spectral correlation of the noise power spectrum of the jth frame voice signal in another way voice signal B is NDPb(w, j), then:
4th, binary conversion treatment is carried out to the Spectral correlation DP (w, n) of each frame voice signal according to the following equation and is obtained To correlation two-value spectrum XDb (w, n):
TDbTo preset second threshold.
After carrying out binary conversion treatment to Spectral correlation, the Spectral correlation of each frame voice signal can be converted into one Equal length, the binary sequence being made of 0 and 1.Data processing speed can fully be improved by binary conversion treatment, so as to follow-up Carry out the matching primitives of highly efficient (robust).
In this example, it is assumed that the corresponding correlation two-value spectrum of the i-th frame voice signal in wherein all the way voice signal A For XDba(w, i), then:
Similar, it is assumed that the corresponding correlation two-value spectrum of jth frame voice signal in another way voice signal B is XDbb (w, j), then:
5th, will the wherein all the way corresponding KD of voice signalaA correlation two-value spectrum is corresponding with another way voice signal KDbCoherence between a correlation two-value spectrum carries out two-by-two matches to obtain the second matching result, KDa、KDbIt is positive integer.
Wherein, the second matching result includes the corresponding matched position of the highest one group of correlation two-value spectrum of matching degree and matching Degree.When the corresponding correlation two-value spectrum quantity of wherein all the way voice signal A is KDaAnd the corresponding correlations of another way voice signal B Property two-value spectrum quantity be KDbWhen, it is necessary to carry out KDa×KDbSecondary correlation two-value spectrum matching.For matching each time, record two Matching degree between a correlation two-value spectrum, and carry out the sequence number nd of matched two correlation two-values spectruma(i) and ndb(j).Its In, nda(i) sequence number of the corresponding correlation two-value spectrum of the i-th frame voice signal in wherein all the way voice signal A, nd are representeda(i) ∈[0,KDa- 1], KDa>=2 and nda(i)、KDaIt is integer;ndb(j) the jth frame sound letter in another way voice signal B is represented The sequence number of number corresponding correlation two-value spectrum, ndb(j)∈[0,KDb- 1], KDb>=2 and ndb(j)、KDbIt is integer.In addition, Under normal conditions, KDaWith the K in above-mentioned steps 204aIt is equal, KDbWith the K in above-mentioned steps 204bIt is equal.
For example if the corresponding correlation two-value of the i-th frame voice signal in wherein all the way voice signal A is composed into XDba (w, i) correlation two-value spectrum XDb corresponding with the jth frame voice signal in another way voice signal Bb(w, j) is matched, then Both matching degree PDbijFor:
Wherein, ⊙ represents same or oeprator.Matching degree PDbijCorresponding position two in equal to two correlation two-value spectrums The equal data of value result are to quantity and data to total ratio.
After the matching degree being calculated between each two correlation two-value spectrum, the highest one group of correlation of matching degree is chosen Two-value is composed, and is recorded this group of correlation two-value and composed corresponding sequence number and matching degree.
In this example, it is assumed that corresponding n-th d of the i-th frame voice signal in wherein all the way voice signal Aa(i) a phase Closing property two-value composes n-th d corresponding with the jth frame voice signal in another way voice signal Bb(j) of a correlation two-value spectrum With degree highest, P is denoted as2.Such as i=1 and j=3,
Step 206, the time delay between two-way voice signal is calculated according to the first matching result and the second matching result.
After the first matching result and the second matching result is calculated, above-mentioned two matching result is integrated to obtain Final matching results, and then calculate the time delay between two-way voice signal according to final matching results.Specifically, this step can be with Including following two sub-steps:
First, for per voice signal all the way, Weighted Average Algorithm is used to the first matched position and the second matched position Final matched position is calculated, the weight of the Weighted Average Algorithm is determined according to the first matching degree and the second matching degree.
In a kind of possible embodiment, it is assumed that the first matching obtained from the frequency domain distribution angle calculation of voice signal Spend for P1, the second matching degree obtained from the Spectral correlation angle calculation of power spectrum is P2, then the corresponding power of the first matched position Weight isThe corresponding weight of second matched position is
The wherein all the way final matched position nl of voice signal AaFor:
Wherein, naRepresent the first matched position of voice signal A, ndaRepresent the second matched position of voice signal A.
Similar, the final matched position nl of another way voice signal BbFor:
Wherein, nbRepresent the first matched position of voice signal B, ndbRepresent the second matched position of voice signal B.
Second, the time delay between two-way voice signal is calculated according to the final matched position of two-way voice signal.
The time delay t between two-way voice signal is calculated according to the following equation:
T=k (nla-nlb);
Wherein, k is time coefficient.
Time coefficient k can be calculated according to STFT sample frequency f, sampled point quantity Num and the overlap coefficient η converted, Time coefficient
In a specific example, if the FFT overlapping 50% of 256 points of the signal sampling using 16KHz, then the time Coefficient
It should be noted is that:Above-mentioned steps 205 can carry out after step 204, can also be before the step 204 Carry out, or be carried out at the same time with step 204, in the present embodiment, is only illustrated after step 204 with step 205, This is not especially limited.
In conclusion the delay time estimation method of voice signal provided in this embodiment, by the short of two-way voice signal When Fourier transformation carry out analysis matching obtain the first matching result, and phase the spectrum for passing through the power spectrum two-way voice signal Closing property carries out analysis matching and obtains the second matching result, and two-way sound is calculated then in conjunction with the first matching result and the second matching result Time delay between sound signal;Solve the problems, such as that accuracy existing for the delay time estimation method that correlation technique is related to is low;From frequency domain point Two angles of Spectral correlation of cloth and power spectrum respectively carry out two-way voice signal the matching analysis, and comprehensive two matchings knot Fruit determines final matching results, has reached raising matching precision, improves the effect of time delay accuracy of estimation.
In addition, delay time estimation method provided in this embodiment, before matching primitives are carried out, by carrying out two-value to data Change is handled, and substantially increases matching efficiency, there is provided a kind of delay time estimation method of robust.
Following is apparatus of the present invention embodiment, can be used for performing the method for the present invention embodiment.It is real for apparatus of the present invention The details not disclosed in example is applied, refer to the method for the present invention embodiment.
Please refer to Fig.3, it illustrates the structure of the time delay estimation device of voice signal provided by one embodiment of the present invention Block diagram, the time delay estimation device can by software, hardware or both be implemented in combination with as electronic equipment part or Person is whole.The time delay estimation device can include:Signal acquisition module 310, the first matching module 320, the second matching module 330 With time-delay calculation module 340.
Signal acquisition module 310, for obtaining two-way voice signal.
First matching module 320, for the Short Time Fourier Transform according to the two-way voice signal to the two-way sound Sound signal carries out coherence and matches to obtain the first matching result, and first matching result includes the of the two-way voice signal One matched position and the first matching degree.
Second matching module 330, the Spectral correlation for the power spectrum according to the two-way voice signal is to described two Road voice signal carries out coherence and matches to obtain the second matching result, and second matching result includes the two-way voice signal The second matched position and the second matching degree.
Time-delay calculation module 340, for calculating described two according to first matching result and second matching result Time delay between the voice signal of road.
In conclusion the time delay estimation device of voice signal provided in this embodiment, by the short of two-way voice signal When Fourier transformation carry out analysis matching obtain the first matching result, and phase the spectrum for passing through the power spectrum two-way voice signal Closing property carries out analysis matching and obtains the second matching result, and two-way sound is calculated then in conjunction with the first matching result and the second matching result Time delay between sound signal;Solve the problems, such as that accuracy existing for the delay time estimation method that correlation technique is related to is low;From frequency domain point Two angles of Spectral correlation of cloth and power spectrum respectively carry out two-way voice signal the matching analysis, and comprehensive two matchings knot Fruit determines final matching results, has reached raising matching precision, improves the effect of time delay accuracy of estimation.
Please refer to Fig.4, the structure of the time delay estimation device of the voice signal provided it illustrates another embodiment of the present invention Block diagram, the time delay estimation device can by software, hardware or both be implemented in combination with as electronic equipment part or Person is whole.The time delay estimation device can include:Signal acquisition module 310, the first matching module 320, the second matching module 330 With time-delay calculation module 340.
Signal acquisition module 310, for obtaining two-way voice signal.
First matching module 320, for the Short Time Fourier Transform according to the two-way voice signal to the two-way sound Sound signal carries out coherence and matches to obtain the first matching result, and first matching result includes the of the two-way voice signal One matched position and the first matching degree.
Wherein, first matching module 320, can include:First tracking cell 320a, the first binarization unit 320b With the first matching unit 320c.
The first tracking cell 320a, for for per voice signal all the way, believing according to the following equation the sound Each frame voice signal in number carries out noise tracking, obtains the noise spectrum N (w, n) of each frame voice signal:
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu < 1;W represents the frequency point sequence number on frequency domain;N represents the frame number in time domain.
The first binarization unit 320b, for carrying out two-value to the frequency spectrum of each frame voice signal according to the following equation Change handles to obtain two-value spectrum Xb (w, n):
TbFor preset first threshold value.
The first matching unit 320c, for will the wherein all the way corresponding K of voice signalaA two-value spectrum and another way sound The corresponding K of sound signalbCoherence between a two-value spectrum carries out two-by-two matches to obtain first matching result, first matching As a result include the highest one group of two-value of matching degree and compose corresponding matched position and matching degree, Ka、KbIt is positive integer.
Second matching module 330, the Spectral correlation for the power spectrum according to the two-way voice signal is to described two Road voice signal carries out coherence and matches to obtain the second matching result, and second matching result includes the two-way voice signal The second matched position and the second matching degree.
Wherein, second matching module 330, can include:Spectra calculation unit 330a, correlation calculations unit 330b, the second tracking cell 330c, the second binarization unit 330d and the second matching unit 330e.
The spectra calculation unit 330a, for for per voice signal all the way, calculating the sound according to the following equation The power spectrum P (w, n) of each frame voice signal in sound signal:
P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αpFor predetermined coefficient and 0 < αp< 1;W tables Show the frequency point sequence number on frequency domain;N represents the frame number in time domain.
The correlation calculations unit 330b, the spectrum of the power spectrum for calculating each frame voice signal according to the following equation Between correlation DP (w, n):
DP (w, n)=| P (w+1, n)-P (w, n) |.
The second tracking cell 330c, for carrying out noise to the Spectral correlation DP (w, n) according to the following equation Tracking, obtains the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1.
The second binarization unit 330d, for according to the following equation related the spectrum of each frame voice signal Property DP (w, n) carry out binary conversion treatment obtain correlation two-value spectrum XDb (w, n):
TDbTo preset second threshold.
The second matching unit 330e, for will the wherein all the way corresponding KD of voice signalaA correlation two-value spectrum with The corresponding KD of another way voice signalbCoherence between a correlation two-value spectrum carries out two-by-two matches to obtain the second matching knot Fruit, second matching result include the highest one group of correlation two-value of matching degree and compose corresponding matched position and matching degree, KDa、KDbIt is positive integer.
Time-delay calculation module 340, for calculating described two according to first matching result and second matching result Time delay between the voice signal of road.
Wherein, the time-delay calculation module 340, can include:Position calculation unit 340a and time-delay calculation unit 340b.
The position calculation unit 340a, for for per voice signal all the way, to first matched position and described Second matched position calculates final matched position using Weighted Average Algorithm, and the weight of the Weighted Average Algorithm is according to What the first matching degree and second matching degree determined.
The time-delay calculation unit 340b, institute is calculated for the final matched position according to the two-way voice signal State the time delay between two-way voice signal.
Optionally, described device can also include:Signal pre-processing module 312 and fourier transformation module 314.
Signal pre-processing module 312, for for per voice signal all the way, being pre-processed to obtain to the voice signal Pretreated voice signal, the pretreatment include noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing At least one of.
Fourier transformation module 314, for carrying out Short Time Fourier Transform to the pretreated voice signal.
In conclusion the time delay estimation device of voice signal provided in this embodiment, by the short of two-way voice signal When Fourier transformation carry out analysis matching obtain the first matching result, and phase the spectrum for passing through the power spectrum two-way voice signal Closing property carries out analysis matching and obtains the second matching result, and two-way sound is calculated then in conjunction with the first matching result and the second matching result Time delay between sound signal;Solve the problems, such as that accuracy existing for the delay time estimation method that correlation technique is related to is low;From frequency domain point Two angles of Spectral correlation of cloth and power spectrum respectively carry out two-way voice signal the matching analysis, and comprehensive two matchings knot Fruit determines final matching results, has reached raising matching precision, improves the effect of time delay accuracy of estimation.
In addition, time delay estimation device provided in this embodiment, before matching primitives are carried out, by carrying out two-value to data Change is handled, and substantially increases matching efficiency.
It should be noted that:The time delay estimation device for the voice signal that above-described embodiment provides is calculating two-way voice signal When be delayed, can as needed will be above-mentioned only with the division progress of above-mentioned each function module for example, in practical application Function distribution is completed by different function module, i.e., the internal structure of equipment is divided into different function modules, with complete with The all or part of function of upper description.In addition, the time delay estimation device of above-described embodiment offer and the side of delay time estimation method Method embodiment belongs to same design, its specific implementation process refers to embodiment of the method, and which is not described herein again.
Fig. 5 is refer to, it illustrates the structure diagram of electronic equipment provided by one embodiment of the present invention.The electronics is set It is ready for use on the delay time estimation method for the voice signal for implementing to be provided in above-described embodiment.Specifically:
Electronic equipment 500 can include RF (Radio Frequency, radio frequency) circuit 510, include one or one with Memory 520, input unit 530, display unit 540, sensor 550, the voicefrequency circuit of upper computer-readable recording medium 560th, WiFi (wireless fidelity, Wireless Fidelity) module 570, include one or more than one processing core The component such as processor 580 and power supply 590.It will be understood by those skilled in the art that the electronic devices structure shown in Fig. 5 is simultaneously The restriction to electronic equipment is not formed, can be included than illustrating more or fewer components, either combines some components or not Same component arrangement.Wherein:
RF circuits 510 can be used for receive and send messages or communication process in, the reception and transmission of signal, especially, by base station After downlink information receives, transfer to one or more than one processor 580 is handled;In addition, will be related to the data sending of uplink to Base station.In general, RF circuits 510 include but not limited to antenna, at least one amplifier, tuner, one or more oscillators, use Family identity module (SIM) card, transceiver, coupler, LNA (Low Noise Amplifier, low-noise amplifier), duplex Device etc..In addition, RF circuits 510 can also be communicated by wireless communication with network and other equipment.The wireless communication can make With any communication standard or agreement, include but not limited to GSM (Global System of Mobile communication, entirely Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, wideband code division multiple access), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short Messaging Service, Short Message Service) etc..
Memory 520 can be used for storage software program and module, and processor 580 is stored in memory 520 by operation Software program and module, so as to perform various functions application and data processing.Memory 520 can mainly include storage journey Sequence area and storage data field, wherein, storing program area can storage program area, the application program (ratio needed at least one function Such as sound-playing function, image player function) etc.;Storage data field can be stored to be created according to using for electronic equipment 500 Data (such as voice data, phone directory etc.) etc..In addition, memory 520 can include high-speed random access memory, may be used also With including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states Part.Correspondingly, memory 520 can also include Memory Controller, to provide processor 580 and input unit 530 to storage The access of device 520.
Input unit 530 can be used for the numeral or character information for receiving input, and produce and user setting and function Control related keyboard, mouse, operation lever, optics or the input of trace ball signal.Specifically, input unit 530 may include figure As input equipment 531 and other input equipments 532.Image input device 531 can be camera or optoelectronic scanning Equipment.Except image input device 531, input unit 530 can also include other input equipments 532.Specifically, other are inputted Equipment 532 can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse One or more in mark, operation lever etc..
Display unit 540 can be used for display by information input by user or be supplied to the information and electronic equipment of user 500 various graphical user interface, these graphical user interface can by figure, text, icon, video and its any combination Lai Form.Display unit 540 may include display panel 541, optionally, can use LCD (Liquid Crystal Display, Liquid crystal display), the form such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) configure display Panel 541.
Electronic equipment 500 may also include at least one sensor 550, for example, optical sensor, motion sensor and other Sensor.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein, ambient light sensor can basis The light and shade of ambient light adjusts the brightness of display panel 541, proximity sensor can when electronic equipment 500 is moved in one's ear, Close display panel 541 and/or backlight.As one kind of motion sensor, gravity accelerometer can detect all directions The size of upper (generally three axis) acceleration, can detect that size and the direction of gravity, available for identification mobile phone posture when static Application (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (for example pedometer, strikes Hit) etc.;The gyroscope that can also configure as electronic equipment 500, barometer, hygrometer, thermometer, infrared ray sensor etc. other Sensor, details are not described herein.
Voicefrequency circuit 560, loudspeaker 561, microphone 562 can provide the audio interface between user and electronic equipment 500. The transformed electric signal of the voice data received can be transferred to loudspeaker 561, is changed by loudspeaker 561 by voicefrequency circuit 560 Exported for voice signal;On the other hand, the voice signal of collection is converted to electric signal by microphone 562, is connect by voicefrequency circuit 560 Voice data is converted to after receipts, then after voice data output processor 580 is handled, it is such as another to be sent to through RF circuits 510 One electronic equipment, or voice data is exported to memory 520 further to handle.Voicefrequency circuit 560 is also possible that Earphone jack, to provide the communication of peripheral hardware earphone and electronic equipment 500.
WiFi belongs to short range wireless transmission technology, and electronic equipment 500 can help user to receive and dispatch by WiFi module 570 Email, browse webpage and access streaming video etc., it has provided wireless broadband internet to the user and has accessed.Although Fig. 5 Show WiFi module 570, but it is understood that, it is simultaneously not belonging to must be configured into for electronic equipment 500, completely can root Omitted according to needs in the essential scope for do not change invention.
Processor 580 is the control centre of electronic equipment 500, utilizes each of various interfaces and connection whole mobile phone Part, by running or performing the software program and/or module that are stored in memory 520, and calls and is stored in memory Data in 520, perform the various functions and processing data of electronic equipment 500, so as to carry out integral monitoring to mobile phone.It is optional , processor 580 may include one or more processing cores;Preferably, processor 580 can integrate application processor and modulatedemodulate Processor is adjusted, wherein, application processor mainly handles operating system, user interface and application program etc., modem processor Main processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 580.
Electronic equipment 500 further includes the power supply 590 (such as battery) to all parts power supply, it is preferred that power supply can lead to Cross power-supply management system and processor 580 be logically contiguous, thus by power-supply management system realize management charging, electric discharge and The functions such as power managed.Power supply 590 can also include one or more direct current or AC power, recharging system, electricity The random component such as source fault detection circuit, power supply changeover device or inverter, power supply status indicator.
Although being not shown, electronic equipment 500 can also be including bluetooth module etc., and details are not described herein.
Specifically in the present embodiment, electronic equipment 500 has further included memory, and one or more than one journey Sequence, either more than one program storage in memory and is configured to by one or more than one processor for one of them Perform the delay time estimation method such as above-mentioned Fig. 1 or embodiment illustrated in fig. 2.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment To complete, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.

Claims (9)

  1. A kind of 1. delay time estimation method of voice signal, it is characterised in that the described method includes:
    Obtain two-way voice signal;
    Coherence is carried out according to the Short Time Fourier Transform of the two-way voice signal to the two-way voice signal to match to obtain First matching result, first matching result include the first matched position and the first matching degree of the two-way voice signal, First matching result is that the two-value spectrum of the two-way voice signal is carried out coherence to match, and is believed per sound all the way Number two-value spectrum be to carry out what binary conversion treatment obtained to the Short Time Fourier Transform of the voice signal;
    Coherence's matching is carried out to the two-way voice signal according to the Spectral correlation of the power spectrum of the two-way voice signal The second matching result is obtained, second matching result includes the second matched position and the second matching of the two-way voice signal Degree, second matching result are that the correlation two-value spectrum of the two-way voice signal is carried out coherence to match, often The correlation two-value spectrum of voice signal is to carry out binary conversion treatment to the Spectral correlation of the power spectrum of the voice signal all the way Obtain;
    For per voice signal all the way, Weighted Average Algorithm meter is used to first matched position and second matched position Final matched position is calculated, the weight of the Weighted Average Algorithm is determined according to first matching degree and second matching degree 's;
    The time delay between the two-way voice signal is calculated according to the final matched position of the two-way voice signal.
  2. 2. the according to the method described in claim 1, it is characterized in that, Fourier in short-term according to the two-way voice signal Conversion carries out coherence to the two-way voice signal and matches to obtain the first matching result, including:
    For per voice signal all the way, each frame voice signal in the voice signal is carried out according to the following equation noise with Track, obtains the noise spectrum N (w, n) of each frame voice signal:
    <mrow> <mi>N</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&amp;alpha;</mi> <mi>u</mi> </msub> <mo>)</mo> <mo>|</mo> <mi>X</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> <mo>|</mo> <mo>+</mo> <msub> <mi>&amp;alpha;</mi> <mi>u</mi> </msub> <mi>N</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mo>|</mo> <mi>X</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>&amp;GreaterEqual;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&amp;alpha;</mi> <mi>d</mi> </msub> <mo>)</mo> <mo>|</mo> <mi>X</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> <mo>|</mo> <mo>+</mo> <msub> <mi>&amp;alpha;</mi> <mi>d</mi> </msub> <mi>N</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mo>|</mo> <mi>X</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>&lt;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
    Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu< 1;w Represent the frequency point sequence number on frequency domain;N represents the frame number in time domain;
    The Short Time Fourier Transform of each frame voice signal is carried out according to the following equation binary conversion treatment obtain two-value spectrum Xb (w, n):
    TbFor preset first threshold value;
    By the wherein all the way corresponding K of voice signalaA two-value composes K corresponding with another way voice signalbA two-value spectrum carries out two-by-two Between coherence match to obtain first matching result, first matching result includes the highest one group of two-value spectrum of matching degree Corresponding matched position and matching degree, Ka、KbIt is positive integer.
  3. 3. the according to the method described in claim 1, it is characterized in that, spectrum of the power spectrum according to the two-way voice signal Between correlation to the two-way voice signal carry out coherence match to obtain the second matching result, including:
    For per voice signal all the way, calculating the power spectrum of each frame voice signal in the voice signal according to the following equation P(w,n):
    P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2
    Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αpFor predetermined coefficient and 0 < αp< 1;W represents frequency Frequency point sequence number on domain;N represents the frame number in time domain;
    The Spectral correlation DP (w, n) of the power spectrum of each frame voice signal is calculated according to the following equation:
    DP (w, n)=| P (w+1, n)-P (w, n) |;
    Noise tracking is carried out to the Spectral correlation DP (w, n) according to the following equation, obtains the noise of each frame voice signal The Spectral correlation NDP (w, n) of power spectrum:
    <mrow> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&amp;beta;</mi> <mi>u</mi> </msub> <mo>)</mo> <mi>D</mi> <mi>P</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> <mo>+</mo> <msub> <mi>&amp;beta;</mi> <mi>u</mi> </msub> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&amp;GreaterEqual;</mo> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&amp;beta;</mi> <mi>d</mi> </msub> <mo>)</mo> <mi>D</mi> <mi>P</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> <mo>+</mo> <msub> <mi>&amp;beta;</mi> <mi>d</mi> </msub> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&lt;</mo> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
    Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1;
    Correlation is obtained to the Spectral correlation DP (w, n) the progress binary conversion treatment of each frame voice signal according to the following equation Property two-value spectrum XDb (w, n):
    TDbTo preset second threshold;
    By the wherein all the way corresponding KD of voice signalaA correlation two-value composes KD corresponding with another way voice signalbA correlation Coherence between two-value spectrum carries out two-by-two matches to obtain second matching result, and second matching result includes matching degree most One group of high correlation two-value composes corresponding matched position and matching degree, KDa、KDbIt is positive integer.
  4. 4. method according to any one of claims 1 to 3, it is characterised in that described according to the short of the two-way voice signal When two-way voice signal described in Fourier transform pairs before coherence matches to obtain the first matching result, further include:
    It is described pre- for per voice signal all the way, being pre-processed to obtain pretreated voice signal to the voice signal Processing includes at least one of noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing;
    Short Time Fourier Transform is carried out to the pretreated voice signal.
  5. 5. the time delay estimation device of a kind of voice signal, it is characterised in that described device includes:
    Signal acquisition module, for obtaining two-way voice signal;
    First matching module, for the Short Time Fourier Transform according to the two-way voice signal to the two-way voice signal into Row coherence matches to obtain the first matching result, and first matching result includes the first match bit of the two-way voice signal It is that the two-value spectrum of the two-way voice signal is carried out coherence to match to obtain to put with the first matching degree, first matching result , the two-value spectrum per voice signal all the way is to carry out binary conversion treatment to the Short Time Fourier Transform of the voice signal to obtain 's;
    Second matching module, the Spectral correlation for the power spectrum according to the two-way voice signal believe the two-way sound Number carry out coherence match to obtain the second matching result, second matching result includes second of the two-way voice signal With position and the second matching degree, second matching result is that the correlation two-value spectrum of the two-way voice signal is concerned with Property match, the correlation two-value spectrum per voice signal all the way is the Spectral correlation to the power spectrum of the voice signal Carry out what binary conversion treatment obtained;
    Time-delay calculation module, for for per voice signal all the way, to first matched position and second matched position Final matched position is calculated using Weighted Average Algorithm, the weight of the Weighted Average Algorithm be according to first matching degree and What second matching degree determined;The two-way sound letter is calculated according to the final matched position of the two-way voice signal Time delay between number.
  6. 6. device according to claim 5, it is characterised in that first matching module, including:First tracking cell, First binarization unit and the first matching unit;
    First tracking cell, for for per voice signal all the way, according to the following equation to every in the voice signal One frame voice signal carries out noise tracking, obtains the noise spectrum N of each frame voice signalb(w,n):
    <mrow> <msub> <mi>N</mi> <mi>b</mi> </msub> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&amp;alpha;</mi> <mi>u</mi> </msub> <mo>)</mo> <mo>|</mo> <mi>X</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> <mo>|</mo> <mo>+</mo> <msub> <mi>&amp;alpha;</mi> <mi>u</mi> </msub> <mi>N</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mo>|</mo> <mi>X</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>&amp;GreaterEqual;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&amp;alpha;</mi> <mi>d</mi> </msub> <mo>)</mo> <mo>|</mo> <mi>X</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> <mo>|</mo> <mo>+</mo> <msub> <mi>&amp;alpha;</mi> <mi>d</mi> </msub> <mi>N</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mo>|</mo> <mi>X</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>&lt;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
    Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu< 1;w Represent the frequency point sequence number on frequency domain;N represents the frame number in time domain;
    First binarization unit, for carrying out two to the Short Time Fourier Transform of each frame voice signal according to the following equation Value handles to obtain two-value spectrum Xb (w, n):
    TbFor preset first threshold value;
    First matching unit, for will the wherein all the way corresponding K of voice signalaA two-value spectrum and another way voice signal pair The K answeredbCoherence between a two-value spectrum carries out two-by-two matches to obtain first matching result, and first matching result includes The highest one group of two-value of matching degree composes corresponding matched position and matching degree, Ka、KbIt is positive integer.
  7. 7. device according to claim 5, it is characterised in that second matching module, including:Spectra calculation list Member, correlation calculations unit, the second tracking cell, the second binarization unit and the second matching unit;
    The spectra calculation unit, for for per voice signal all the way, calculating according to the following equation in the voice signal Each frame voice signal power spectrum P (w, n):
    P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2
    Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αpFor predetermined coefficient and 0 < αp< 1;W represents frequency Frequency point sequence number on domain;N represents the frame number in time domain;
    The correlation calculations unit, the Spectral correlation of the power spectrum for calculating each frame voice signal according to the following equation DP(w,n):
    DP (w, n)=| P (w+1, n)-P (w, n) |;
    Second tracking cell, for carrying out noise tracking to the Spectral correlation DP (w, n) according to the following equation, obtains The Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
    <mrow> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&amp;beta;</mi> <mi>u</mi> </msub> <mo>)</mo> <mi>D</mi> <mi>P</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> <mo>+</mo> <msub> <mi>&amp;beta;</mi> <mi>u</mi> </msub> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&amp;GreaterEqual;</mo> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&amp;beta;</mi> <mi>d</mi> </msub> <mo>)</mo> <mi>D</mi> <mi>P</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> <mo>+</mo> <msub> <mi>&amp;beta;</mi> <mi>d</mi> </msub> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&lt;</mo> <mi>N</mi> <mi>D</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
    Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1;
    Second binarization unit, for according to the following equation to the Spectral correlation DP of each frame voice signal (w, N) carry out binary conversion treatment and obtain correlation two-value spectrum XDb (w, n):
    TDbTo preset second threshold;
    Second matching unit, for will the wherein all the way corresponding KD of voice signalaA correlation two-value spectrum and another way sound The corresponding KD of signalbCoherence between a correlation two-value spectrum carries out two-by-two matches to obtain second matching result, and described the Two matching results include the highest one group of correlation two-value of matching degree and compose corresponding matched position and matching degree, KDa、KDbIt is just Integer.
  8. 8. according to any device of claim 5 to 7, it is characterised in that described device further includes:
    Signal pre-processing module, for for per voice signal all the way, being pre-processed to the voice signal Voice signal afterwards, the pretreatment are included in noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing extremely Few one kind;
    Fourier transformation module, for carrying out Short Time Fourier Transform to the pretreated voice signal.
  9. A kind of 9. computer-readable recording medium, it is characterised in that have program stored therein in the computer-readable recording medium, institute Program is stated to be used for realization such as Claims 1-4 any one of them method.
CN201510083890.1A 2015-02-13 2015-02-13 The delay time estimation method and device of voice signal Active CN104700842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510083890.1A CN104700842B (en) 2015-02-13 2015-02-13 The delay time estimation method and device of voice signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510083890.1A CN104700842B (en) 2015-02-13 2015-02-13 The delay time estimation method and device of voice signal

Publications (2)

Publication Number Publication Date
CN104700842A CN104700842A (en) 2015-06-10
CN104700842B true CN104700842B (en) 2018-05-08

Family

ID=53347896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510083890.1A Active CN104700842B (en) 2015-02-13 2015-02-13 The delay time estimation method and device of voice signal

Country Status (1)

Country Link
CN (1) CN104700842B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469484B (en) * 2015-11-20 2018-09-14 宁波大业产品造型艺术设计有限公司 A kind of APP smart homes lock
CN105567434A (en) * 2015-12-31 2016-05-11 山东泰德新能源有限公司 Production apparatus of high cleanness biodiesel, and method thereof
CN105726130A (en) * 2016-01-26 2016-07-06 高玮 Maintenance prompter for medical equipment
CN105872275B (en) * 2016-03-22 2019-10-11 Tcl集团股份有限公司 A kind of speech signal time delay estimation method and system for echo cancellor
CN106057211B (en) * 2016-05-27 2018-08-21 广州多益网络股份有限公司 A kind of Signal Matching method and device
CN106209491B (en) * 2016-06-16 2019-07-02 苏州科达科技股份有限公司 A kind of time delay detecting method and device
CN107745207A (en) * 2017-10-17 2018-03-02 桂林电子科技大学 A kind of three-dimensional welding robot mixing control method
CN107862917A (en) * 2017-10-27 2018-03-30 湖南城市学院 Application system and method for the form vocabulary test in children and adults' English teaching
CN107908211A (en) * 2017-11-14 2018-04-13 朱宪民 A kind of solar energy irrigation sprinkler water intaking pressurizing control system
CN107993501A (en) * 2017-12-04 2018-05-04 菏泽学院 A kind of human anatomy teaching system
CN108200526B (en) * 2017-12-29 2020-09-22 广州励丰文化科技股份有限公司 Sound debugging method and device based on reliability curve
CN108399946A (en) * 2018-03-05 2018-08-14 湖北省第三人民医院 A kind of nursing work load distribution assistance system
CN109388067A (en) * 2018-11-01 2019-02-26 长沙理工大学 A kind of intelligent home control system towards function
CN109451254A (en) * 2018-12-14 2019-03-08 广州市科虎电子有限公司 A kind of smart television digital receiver
CN110085259B (en) * 2019-05-07 2021-09-17 国家广播电视总局中央广播电视发射二台 Audio comparison method, device and equipment
CN110310661B (en) * 2019-07-03 2021-06-11 云南康木信科技有限责任公司 Method for calculating two-path real-time broadcast audio time delay and similarity

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102606891A (en) * 2012-04-11 2012-07-25 广州东芝白云自动化系统有限公司 Water leakage detector, water leakage detecting system and water leakage detecting method
CN102854494A (en) * 2012-08-08 2013-01-02 Tcl集团股份有限公司 Sound source locating method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6721699B2 (en) * 2001-11-12 2004-04-13 Intel Corporation Method and system of Chinese speech pitch extraction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102606891A (en) * 2012-04-11 2012-07-25 广州东芝白云自动化系统有限公司 Water leakage detector, water leakage detecting system and water leakage detecting method
CN102854494A (en) * 2012-08-08 2013-01-02 Tcl集团股份有限公司 Sound source locating method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
听觉特性及噪声估计在语音增强算法中的研究;周氏青香;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130615(第06期);第1-70页 *
基于噪声轨迹的语音增强方法研究;袁文浩;《中国博士学位论文全文数据库 信息科技辑》;20140615(第06期);第1-91页 *

Also Published As

Publication number Publication date
CN104700842A (en) 2015-06-10

Similar Documents

Publication Publication Date Title
CN104700842B (en) The delay time estimation method and device of voice signal
US10956771B2 (en) Image recognition method, terminal, and storage medium
CN103578474B (en) A kind of sound control method, device and equipment
CN103702297B (en) Short message enhancement, apparatus and system
CN104618222B (en) A kind of method and device for matching facial expression image
CN103686963B (en) Control wireless network method of switching, device, equipment and system
CN110334360A (en) Machine translation method and device, electronic equipment and storage medium
CN103634717B (en) A kind of method, device and the terminal device of the control of utilization earphone
CN104581221A (en) Video live broadcasting method and device
CN111050370A (en) Network switching method and device, storage medium and electronic equipment
CN107247711A (en) A kind of two-way translation method, mobile terminal and computer-readable recording medium
CN107277230A (en) The voice broadcast method and Related product of message
CN108021572A (en) Return information recommends method and apparatus
CN107566985A (en) A kind of main SIM card of mobile terminal determines method and device
CN107507628A (en) Singing methods of marking, device and terminal
CN107205088A (en) Camera control method and Related product
CN103714316B (en) Image-recognizing method, device and electronic equipment
CN104166614A (en) Frame rate detecting method for mobile device and related device
CN104966046A (en) Method and device for evaluating face key point positioning result
CN104717125A (en) Graphic code storage method and device
CN107749302A (en) Audio-frequency processing method, device, storage medium and terminal
CN104461597A (en) Starting control method and device for application program
CN104699501B (en) A kind of method and device for running application program
CN109239611A (en) Terminal battery electricity quantity calibration method, terminal and computer readable storage medium
CN104409081A (en) Speech signal processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20161207

Address after: 511442 Guangzhou City, Guangdong Province, Panyu District, South Village, Huambo Road, No. 79, Huambo Business District Wanda Commercial Plaza, B-1 building, room, room 2705, room

Applicant after: Guangzhou Baiguoyuan Information Technology Co. Ltd.

Address before: 511442 Guangzhou City, Guangdong Province, Panyu District, South Village, Huambo, No. 79, No. two road, business district, Wanda Plaza, North building, B-1 floor, floor

Applicant before: All kinds of fruits garden, Guangzhou network technology company limited

GR01 Patent grant
GR01 Patent grant